Partition Properties

Alignment partition Run name Rate multiplier Sites
3ad39b9161154796bf2aef1aa746b469.phy fastest1nocalib 4.677767 2697
5fad91d04084ac4043f35092e579c27f.phy fastest2nocalib 5.093961 2036
64a99fde30f869ec30f91c8d2f3fe6fe.phy fastest3nocalib 3.848822 2987
4ff6dce0b7cebc9428b4b77e79356484.phy largest1nocalib 0.267668 30803
ccf55a6ee6d62f840a124bcc0c98ecf5.phy largest2nocalib 0.010000 132188
fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy largest3nocalib 0.312974 12916

PhyloBayes Analysis

Analysis Protocol. Phase 1 – Uncalibrated Analysis

In order to treat the maximum ages specified in the calibration file as soft bounds, the prior must be set to the birth-death process according to the manual. However, if the birth-death prior is combined with a set of calibrations, its hyperparameters must be fixed to specific values rather than estimated from the data. The PhyloBayes manual recommends running a preliminary analysis without calibrations and with free hyperparameters. The resulting estimates can then be used to perform a calibrated analysis.

(Remember: the supplied tree may be unrooted or rooted, but must be in the NEWICK format. If the first line contains information about the number of tips rather than the topology, the spaces will cause segmentation faults.)

../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -r acanth.outgroup -ln -bd fastest1nocalib

(In this case, the outgroup file is superfluous, since the tree is already rooted. Therefore, the following 5 runs do not make use of the -r option. This should not affect the results in any way.)

../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd fastest2nocalib
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd fastest3nocalib
../pb -d 4ff6dce0b7cebc9428b4b77e79356484.phy -T 12_no_outgroups_scheme_3.tre -ln -bd largest1nocalib
../pb -d ccf55a6ee6d62f840a124bcc0c98ecf5.phy -T 12_no_outgroups_scheme_3.tre -ln -bd largest2nocalib
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd largest3nocalib

Analysis Progress

Example of a Tracer-compatible trace file generated by PhyloBayes:

##   X.cycle X.treegen time  loglik  length   sigma        mu meanrates
## 1       1         0    3 -132220 3.02257 6.17254 0.1128120   1.21207
## 2       2         0    6 -130045 3.53405 5.81433 0.0798298   1.98474
## 3       3         0    8 -129011 3.85674 6.96148 0.2372700   0.73773
## 4       4         0   11 -128331 4.08829 7.05736 0.1749990   1.09214
## 5       5         0   14 -127865 4.30031 6.06568 0.1890880   1.04218
## 6       6         0   17 -127540 4.52842 7.25878 0.1826010   1.09504
##        p1       p2    alpha nmode    stat statalpha   kappa allocent
## 1 3.46031 1.498980 0.289504    81 1.16055   5.01021 14.5527  3.48840
## 2 6.54925 0.899936 0.253237    82 1.15225   5.87930 15.6545  3.53462
## 3 3.74662 1.888960 0.259104    82 1.14826   5.31522 15.0681  3.53537
## 4 3.53956 1.431250 0.267635    81 1.14046   5.84142 14.7984  3.56157
## 5 3.48121 2.049510 0.277655    81 1.13575   5.20172 16.4106  3.56845
## 6 4.57659 1.032550 0.293295    75 1.13833   5.18068 13.8315  3.54547

Progress check: Friday 11/18/2016, 1:05 PM (fastest1nocalib), 1:41 PM (fastest2nocalib), 1:49 PM (fastest3nocalib), 2:01 PM (largest1nocalib), 2:05 PM (largest2nocalib), 1:54 PM (largest3nocalib). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:

parameter fast1 fast2 fast3 large1 large2 large3
states 101564 96979 71418 8375 2039 19871
loglik 1325 2310 780 45 3 30
length 2235 2744 1292 2505 469 2543
sigma 4396 3590 2882 11 16 15
mu 1149 1103 535 22 265 32
meanrates 1345 1470 512 79 107 239
p1 1102 1032 573 39 314 56
p2 285 229 243 46 22 1074
alpha 6225 5192 1693 6200 150 14430
nmode 342 389 371 30 42 84
stat 3206 3086 1337 49 3 31
statalpha 34753 4021 7103 4140 206 1286
kappa 583 556 543 101 129 198
allocent 170 251 171 3 3 46

Progress check: Thursday 11/24/2016, 00:13 (fastest1nocalib), 00:15 (fastest2nocalib), 00:17 (fastest3nocalib), 00:24 (largest1nocalib), 00:25 (largest2nocalib), 00:25 (largest3nocalib). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:

parameter fast1 fast2 fast3 large1 large2 large3
states 220907 246584 182374 21771 5314 50975
loglik 3442 5831 2456 51 4 74
length 4701 6967 2208 6137 1264 6452
sigma 6426 9339 3262 3619 44 112
mu 2604 2827 1139 618 462 134
meanrates 2376 3012 1560 638 57 111
p1 1573 1957 706 3219 860 818
p2 394 355 296 6658 90 4557
alpha 13401 13734 1879 14922 382 37241
nmode 947 1183 964 59 4 222
stat 6172 6952 3126 56 4 76
statalpha 73688 8656 12776 9557 53 4973
kappa 1503 1689 1464 335 6 836
allocent 462 706 478 10 3 85

Progress check of the unconverged analyses: Friday 12/05/2016, 11:12 PM (largest1nocalib), 11:13 (largest2nocalib), 11:11 PM (largest3nocalib). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:

parameter largest1 largest2 largest3
states 43950 10602 102442
loglik 143 4 133
length 11768 2716 11510
sigma 7609 80 213
mu 1125 739 264
meanrates 1145 130 217
p1 6205 1290 1603
p2 12956 213 8595
alpha 30927 718 75924
nmode 112 6 469
stat 155 4 136
statalpha 19358 32 10492
kappa 286 9 1391
allocent 27 4 136

A detailed examination of the table above shows that the parameters for which the chain failed to converge to the posterior are almost exclusively those related to the site-heterogeneous mixture model. This is to be expected given the comparatively large number of sites in the three partitions. In contrast, the only two parameters of direct interest – p1 and p2 – have reached sufficient effective sample sizes in all the 3 analyses. This suggests it might be worthwhile to switch to another approach in order to speed up the analysis, and run the uncalibrated analyses of the 3 largest partitions under a simpler model with just one equlibrium frequency profile (such as GTR+\(\Gamma_4\)) instead of the default mixture of F81 models. Note that since the birth-death process of speciation and extinction is independent of the nucleotide substitution process, it does not matter that p1 and p2 were estimated under CAT-F81 for the 3 fastest partitions and under GTR+\(\Gamma\) for the 3 largest partitions. Therefore, the 3 chains listed above (largest1nocalib, largest2nocalib, largest3nocalib) were soft-stopped, and 3 new chains were started as follows:

../pb -d 4ff6dce0b7cebc9428b4b77e79356484.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 largest1nocal2
../pb -d ccf55a6ee6d62f840a124bcc0c98ecf5.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 largest2nocal2
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 largest3nocal2

Progress check of the unconverged analyses: Friday 12/17/2016, 01:46 PM (largest1nocal2), 01:48 (largest2nocal2), 01:48 PM (largest3nocal2). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:

parameter largest1 largest2 largest3
states 4420 1077 9970
loglik 158 944 1318
length 2972 965 2848
sigma 104 86 113
mu 242 869 483
meanrates 277 555 150
p1 717 505 1497
p2 740 150 5781
alpha 2982 64 7534
nmode - - -
stat 2317 969 4357
statalpha - - -
rrent 2303 797 4480
meanrr 326 525 505
kappa - -
allocent - -

Progress check of the unconverged analyses: Friday 12/23/2016, 03:01 AM (largest1nocal2, largest2nocal2, largest3nocal2). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:

parameter largest1 largest2 largest3
states 7167 1731 16073
loglik 3995 1506 2364
length 5144 1551 4617
sigma 5386 148 185
mu 476 1428 715
meanrates 1779 492 206
p1 4364 840 2487
p2 3998 212 8591
alpha 4695 118 12361
nmode - - -
stat 3967 1557 6673
statalpha - - -
rrent 4575 951 7184
meanrr 539 787 915
kappa - -
allocent - -

The analyses that reached convergence in all of their parameters were then “soft-stopped” using the following commands:

../stoppb fastest1nocalib 
../stoppb fastest2nocalib 
../stoppb fastest3nocalib 
../stoppb largest1nocal2 
../stoppb largest3nocal2 

The chains stopped at 250673, 282838, 209815, 7176, and 17588 generations (“points”), respectively. In order to obtain reliable estimates of the two parameters of interest (\(p_1 = \lambda - \mu\) = birth rate minus death rate; \(p_2 = \lambda\omega\) = birth rate times sampling rate), the chain was subjected to two different analyses:

  • Using the default readdiv program from the PhyloBayes package. For the fastest partitions, the first 10,000 trees were discarded as burnin and only every 500th tree was included to minimize autocorrelation. For the largest1 chain, which was two orders of magnitude shorter, the first 1000 trees were discarded and the subsampling frequency was set equal to 30. The largest3 chain was assigned a burnin of 2000 generations, with the ../readdiv analysis sampling every 50th tree:

    ../readdiv -x 10000 500 fastest1nocalib
    ../readdiv -x 10000 500 fastest2nocalib
    ../readdiv -x 10000 500 fastest3nocalib
    ../readdiv -x 1000 30 largest1nocal2
    ../readdiv -x 2000 50 largest3nocal2
  • Using Tracer v1.6, in which the first tenth of the collected samples is automatically discarded as burnin and posterior means as well as effective sample sizes (ESS) are automatically computed from the rest. Both the means and the effective sizes are reported below.

chain points (readdiv) p1 (readdiv) p2 (readdiv) p1 ESS p2 ESS p1 (Tracer) p2 (Tracer)
fastest1 481 0.0018922 +/- 0.00139346 0.00148774 +/- 0.00043172 1835 489 1.798 1.45
fastest2 545 0.00141146 +/- 0.00120643 0.00153467 +/- 0.00047274 2327 420 1.405 1.553
fastest3 399 0.00195453 +/- 0.00146607 0.00148684 +/- 0.000455553 886 363 1.987 1.474
largest1 205 0.0172977 +/- 0.00212044 1.48854e-10 +/- 4.22314e-10 4380 4002 17.491 1.139e-7
largest3 311 0.0162411 +/- 0.00201529 2.93377e-10 +/- 7.18424e-10 3043 8995 16.387 3.149e-7

As can be seen from the above, the estimates from readdiv and Tracer are highly similar to each other (except for scaling). To ensure compatibility, the readdiv scaling will be used for the calibrated analyses.

Analysis Protocol. Phase 2 – Prior Analysis

According to the manual, the prior on the root age is automatically determined from the birth-death model when the latter is used as the tree prior. However, the resulting distribution is reported to be “proper, but apparently unstable”. Therefore, it is recommended to specify the -rp option explicitly, and run an analysis without data (i.e., sampling from the prior) but with calibrations. The results are then used to verify that (1) the distributions for the calibrated nodes correspond to user expectations and (2) the root age prior is sufficiently non-informative.

The root age prior is a gamma distribution. The manual recommends setting its mean and standard deviation to the same value in order to obtain an exponential distribution for which that value is the mean. Here, the mean is set to 130 Ma, which corresponds well to the “soft” (95%) upper bound on the oldest calibration in the tree (Aipichthys: 128.82 Ma). The results of the prior-sampling analysis should therefore approximate the following distribution:

## in R, gamma is specified by shape and rate:
## mean = 130 = shape/rate
## stdev = 130 = sqrt(shape)/rate
## --> shape = 1, rate = 1/130

curve(dgamma(x, 1, 1/130), from=0, to=500, 
      main = "Suggested prior", xlab = "Age (in Ma)", ylab = "Probability density")

The following calibration file is used for the prior run under the -sb (soft bound) option:

##                         C1                      C2     C3    C4
## 1                       12                             NA    NA
## 2         lampris_guttatus    aphredoderus_sayanus 128.82 98.00
## 3          polymixia_lowei    aphredoderus_sayanus 128.04 93.90
## 4    percopsis_omiscomycus    aphredoderus_sayanus  93.51 63.10
## 5   sargocentron_coruscum2 myripristis_leiognathus 109.29 49.00
## 6         scomber_scombrus scomberomorus_maculatus  95.58 54.17
## 7         pomacanthus_paru     chaetodon_ocellatus  59.26 29.62
## 8      epibulus_insidiator      scarus_ferrigineus  43.95 11.90
## 9    ogcocephalus_radiatus    cryptopsaras_couesii  53.93 49.00
## 10      xenentodon_cancila   pterophyllum_leopoldi  80.52 49.00
## 11      parachirus_xenicus  poecilopsetta_plinthus  53.88 41.20
## 12 istiophorus_platypterus          mene_maculatus  95.64 55.20
## 13          seriola_zonata          alepes_kleinii  61.61 49.00

The analyses were started with the commands below. Since the runs only differed in the dataset used, which exercised no influence on the results (as the analyses were only allowed to sample from the prior), the reason for performing 3 separate runs was to ensure that the root age prior was sufficiently stable.

../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0018922 0.00148774 -cal calib -sb -rp 130 130 -prior fastest1prior
../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00141146 0.00153467 -cal calib -sb -rp 130 130 -prior fastest2prior
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00195453 0.00148684 -cal calib -sb -rp 130 130 -prior fastest3prior

The chains were soft-stopped at 116276 (fastest1prior), 116063 (fastest2prior), and 109005 (fastest3prior) generations and summarized using ../readdiv. Since the analysis only sampled from the prior, no burnin was specified in the command, and each sample was used for the computation. The command prints basic information (mean, st. dev.) about the shape of the root age distribution to the screen and generates a file with the _sample.dates suffix, containing more detailed information.

chain mean root age root age st. dev. lower 95% HPD bound upper 95% HPD bound
fastest1prior 222.134 97.1753 116.01 480.845
fastest2prior 217.357 92.908 114.938 456.943
fastest3prior 238.464 124.758 116.534 605.302

The prior assigns too much probability to excessively old ages. Potentially, this could be mitigated by specifying an exponential such that 95% (or even 99%) of the total probability mass falls below a certain upper bound (e.g., 143 Ma). However, because the root age prior cannot be offset, such an exponential would assign most of the total probability mass to the “impossible” region below 98 Ma (blue in the figure below) rather than to the desired interval between 98 Ma and 143 Ma (red):

## [1] "Blue (impossible) region probability = 0.957403822374395"
## [1] "Red (desired) region probability = 0.0325961776256052"

Therefore, it is preferable to explicitly specify the root age prior as another calibration in the calibration file, since calibrations (loaded with the -cal command) – unlike the root age prior (specified with the -rp command) – can be offset by a hard minimum (here, 98 Ma):

##                         C1                      C2     C3    C4
## 1                       13                             NA    NA
## 2         lampris_guttatus     chaetodon_ocellatus 143.00 98.00
## 3         lampris_guttatus    aphredoderus_sayanus 128.82 98.00
## 4          polymixia_lowei    aphredoderus_sayanus 128.04 93.90
## 5    percopsis_omiscomycus    aphredoderus_sayanus  93.51 63.10
## 6   sargocentron_coruscum2 myripristis_leiognathus 109.29 49.00
## 7         scomber_scombrus scomberomorus_maculatus  95.58 54.17
## 8         pomacanthus_paru     chaetodon_ocellatus  59.26 29.62
## 9      epibulus_insidiator      scarus_ferrigineus  43.95 11.90
## 10   ogcocephalus_radiatus    cryptopsaras_couesii  53.93 49.00
## 11      xenentodon_cancila   pterophyllum_leopoldi  80.52 49.00
## 12      parachirus_xenicus  poecilopsetta_plinthus  53.88 41.20
## 13 istiophorus_platypterus          mene_maculatus  95.64 55.20
## 14          seriola_zonata          alepes_kleinii  61.61 49.00

Analysis Protocol. Phase 3 – Calibrated Analysis

The calibrated PhyloBayes analyses were run under the lognormal relaxed clock model, birth-death tree prior (whose hyperparameters were fixed to the values found in the uncalibrated analyses), and the GTR model with gamma-distributed across-site rate variation (discretized into 4 rate categories):

../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0018922 0.00148774 -cal newcalib -sb -gtr -dgam 4 fastest1cali
../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00141146 0.00153467 -cal newcalib -sb -gtr -dgam 4 fastest2cali
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00195453 0.00148684 -cal newcalib -sb -gtr -dgam 4 fastest3cali
../pb -d 4ff6dce0b7cebc9428b4b77e79356484.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0172977 0.000000000148854 -cal newcalib -sb -gtr -dgam 4 largest1cali
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0162411 0.000000000293377 -cal newcalib -sb -gtr -dgam 4 largest3cali

Note that:

  • PhyloBayes implements soft bounds on calibrations in a different manner than BEAST. While the latter allows the user to couple a hard lower bound with a soft upper bound, PhyloBayes can only use a soft upper bound with no lower bound (5% probability assigned to the tail of the distribution beyond the bound), or in conjuction with a lower bound that is also soft (2.5% probability assigned to each tail). The latter option seems preferable, but it will be necessary to check if the posterior mean age of some of the calibrated nodes is younger than the calibration allows.
  • The -gtr option is equivalent to -gtr -ncat 1 – i.e., a single equilibrium frequency profile is applied to the entire dataset.

The program assigned the following node numbers to the 13 calibrated nodes, which will be used to refer to their ages in the _sample.dates file:

Node Number
lampris_guttatus + chaetodon_ocellatus 118
lampris_guttatus + aphredoderus_sayanus 119
polymixia_lowei + aphredoderus_sayanus 123
percopsis_omiscomycus + aphredoderus_sayanus 124
sargocentron_coruscum2 + myripristis_leiognathus 132
scomber_scombrus + scomberomorus_maculatus 162
pomacanthus_paru + chaetodon_ocellatus 220
epibulus_insidiator + scarus_ferrigineus 210
ogcocephalus_radiatus + cryptopsaras_couesii 229
xenentodon_cancila + pterophyllum_leopoldi 173
parachirus_xenicus + poecilopsetta_plinthus 189
istiophorus_platypterus + mene_maculatus 192
seriola_zonata + alepes_kleinii 195

Progress check: Friday 12/02/2016, 4:08 PM (fastest1cali), 4:11 PM (fastest2cali), 4:14 PM (fastest3cali). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:

parameter fastest1 fastest2 fastest3
states 14700 16599 14004
loglik 4412 5820 5523
length 3413 3964 3467
sigma 5210 4396 4394
mu 1228 1210 1016
meanrates 896 987 702
p1 - - -
p2 - - -
scale 1517 1335 1549
alpha 5122 4856 4390
nmode - - -
stat 3146 3478 2759
statalpha - - -
rrent 5173 5382 5896
meanrr 1147 1478 1089
kappa - - -
allocent - - -

Progress check: Sunday 01/01/2017, 09:52 AM (largest3cali),

parameter largest1 largest3
states 3882 6200
loglik 29 662
length 2963 1828
sigma 18 364
mu 154 446
meanrates 80 452
p1 - -
p2 - -
scale 10 238
alpha 2933 4774
nmode - -
stat 2056 2659
statalpha - -
rrent 2552 2878
meanrr 199 370
kappa - -
allocent - -

The missing ESS values for a number of parameters are likely due to the fact that the parameters in questions were not estimated: for example, the birth-death model hyperparameters p1 and p2 were fixed to specific values, while the GTR+\(\Gamma\) model used for the analysis did not include kappa (transition/transversion ratio) or nmode (the mode of the number of distinct equilibrium frequency profiles). The meaning of the remaining two parameters is less clear, but allocent probably refers to the allocation of individual sites to different components of the profile mixture. It is notable that statalpha has an unchanging value of 20.

The first three analyses to reach convergence were soft-stopped using ../stoppb one day later (Saturday 12/03/2016, fastest1cali: 6:45 PM, fastest2cali: 6:50 PM, fastest3cali: 6:51 PM), to make sure that none of the missing ESS values were in fact caused by insufficient sampling for the parameter in question. (This was confirmed by inspecting the respective .trace files.) The largest3 chain was paused immediately after examining its trace.

The chains stopped at 17413, 19628, 16548, and 6215 generations (“points”), respectively. The resulting time trees were then examined in two ways: by running the ../readdiv program and by visualizing the resulting _sample.labels file in FigTree v1.4.2. For the three “fastest” chains, the subsampling frequency was conservatively estimated at 50 – in fact, according to the investigation of the final trace files in Tracer, none of the estimated parameters showed autocorrelation times greater than 15 (fastest1cali, fastest2cali) or 20 (fastest3cali). The burnin was specified to comprise the first tenth of each run, as is the default setting in Tracer. For the largest3 chain, the same general approach was followed, but the subsampling frequency was decreased to 25 to keep the final number of samples above 200. Once again, this frequency was only chosen after making sure that it did not exceed the longest autocorrelation time observed in Tracer. None of the chains showed a clearly defined burnin stage when examined in Tracer; instead of an initial rapid climb, the traces immediately assumed the typical “white noise” pattern indicative of reaching stationarity.

../readdiv -x 174 50 fastest1cali
../readdiv -x 196 50 fastest2cali
../readdiv -x 165 50 fastest3cali
../readdiv -x 622 25 largest3cali
fastest1cali PhyloBayes time tree

fastest1cali PhyloBayes time tree

fastest2cali PhyloBayes time tree

fastest2cali PhyloBayes time tree

fastest3cali PhyloBayes time tree

fastest3cali PhyloBayes time tree

largest3cali PhyloBayes time tree

largest3cali PhyloBayes time tree

As mentioned above, it is necessary to check if any of the calibration nodes are younger than allowed by their lower bounds. In the table below, node ages above the upper bound are in italics and ages below the lower bound are in bold.

Calibration Lower bound Upper bound Node number Fastest 1 Fastest 2 Fastest 3 Largest 3
Aipichthys 98.0 128.82 119 128.253 128.0939 127.2215 96.8281
Berybolcensis 49.0 109.29 132 78.4578 80.785 81.5117 95.212
Calatomus 11.9 43.95 210 26.9649 23.3516 26.1327 42.5489
Chaetodontidae 29.62 59.26 220 59.467 60.4535 59.8 58.98
Eobuglossus 41.2 53.88 189 43.3564 42.4537 46.1384 53.2311
Eucoelopoma 54.17 95.58 162 60.1789 67.7836 61.7867 91.9956
Homonotichthys 93.9 128.04 123 118.2912 120.7762 120.7594 94.8645
Mcconichthys 63.1 93.51 124 49.1689 56.5919 46.1071 92.4408
Ramphexocoetus 49.0 80.52 173 77.4699 75.2663 75.334 57.8225
Tarkus 49.0 53.93 229 49.6837 50.0037 49.5935 52.5946
Mene 55.2 95.64 192 72.4889 71.6771 71.2218 56.8813
Eastmanelepes 49.0 61.61 195 52.6297 51.0873 52.7224 57.2809

Note that the complete information on overflows and underflows (reporting percentages of the 95% HPDs that extend beyond the bounds rather than just the means) is stored in the _sample.calib file generated by readdiv.

Analysis Protocol. Phase 4 – Extended Calibrations

A more comprehensive dataset containing 29 calibration points was used to construct a new calibration file. Of these, only 28 points were included in the final file, since the deepest calibration was assigned to the root of the entire tree (including outgroups, which were removed from the phylobayes alignment files). On the other hand, the same root calibration that was used in the previous phase of the analysis was also added to the new calibration file in order to restrict the default improper uniform prior on the root age to a paleontologically realistic time range:

##                           C1                       C2    C3     C4
## 1                         29                             NA     NA
## 2           lampris_guttatus       scarus_ferrigineus 143.0 98.000
## 3       anoplogaster_cornuta       scarus_ferrigineus 128.0 93.900
## 4        epibulus_insidiator       scarus_ferrigineus  63.3 11.900
## 5         balistes_capriscus canthigaster_figueiredoi  69.6 50.500
## 6         takifugu_occelatus canthigaster_figueiredoi  58.5 32.020
## 7      ogcocephalus_radiatus     cryptopsaras_couesii  61.6 49.000
## 8           pomacanthus_paru     acanthurus_olivaceus  80.8 54.170
## 9             naso_unicornis     acanthurus_olivaceus  69.3 49.000
## 10          pomacanthus_paru        chaetodon_kleinii  66.0 29.620
## 11       coryphaena_hippurus           alepes_kleinii  80.8 54.170
## 12            seriola_zonata           alepes_kleinii  69.3 49.000
## 13   istiophorus_platypterus           mene_maculatus  94.6 55.200
## 14    citharoides_macrolepis   poecilopsetta_plinthus  79.7 49.000
## 15        parachirus_xenicus   poecilopsetta_plinthus  66.6 41.200
## 16        centropomus_medius       sphyraena_putnamae  79.7 49.000
## 17        xenentodon_cancila    pterophyllum_leopoldi  79.7 49.000
## 18 pholidichthys_leucotaenia    pterophyllum_leopoldi  67.4 45.460
## 19         syngnathus_fuscus   pseudupeneus_maculatus 111.3 69.710
## 20         syngnathus_fuscus            aulostomus_sp  93.7 49.000
## 21          scomber_scombrus  scomberomorus_maculatus  94.2 54.170
## 22    sargocentron_coruscum2  myripristis_leiognathus  93.7 49.000
## 23          lampris_guttatus             gadus_morhua 143.0 98.000
## 24                zeus_faber             gadus_morhua 111.3 69.710
## 25  coryphaenoides_rupestris             gadus_morhua  94.4 53.700
## 26                zeus_faber      zenopsis_conchifera  91.4 32.020
## 27           polymixia_lowei     aphredoderus_sayanus 128.0 93.900
## 28     percopsis_omiscomycus     aphredoderus_sayanus 109.6 58.551
## 29          lampris_guttatus         regalecus_glesne 122.5 54.170
## 30              zu_elongatus         regalecus_glesne  94.0  5.330

Note that the ages for the tree as a whole and for the lampridiform-polymixiiform-percopsiform-zeiform-gadiform assemblage are drawn from the same prior.

The new round of analyses was performed with exactly the same settings, except for the use of a different set of calibrations:

../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0018922 0.00148774 -cal extendcalib -sb -gtr -dgam 4 fastest1ext
../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00141146 0.00153467 -cal extendcalib -sb -gtr -dgam 4 fastest2ext
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00195453 0.00148684 -cal extendcalib -sb -gtr -dgam 4 fastest3ext
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0162411 0.000000000293377 -cal extendcalib -sb -gtr -dgam 4 largest3ext

Progress check: Friday 12/14/2016, 9:38 AM (fastest1ext), 9:38 AM (fastest2ext), 9:39 AM (fastest3ext). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:

parameter fastest1 fastest2 fastest3
states 17160 19164 16524
loglik 5111 6159 6045
length 3276 4328 4064
sigma 5133 7195 4589
mu 1162 1446 1243
meanrates 803 939 822
p1 - - -
p2 - - -
scale 458 1112 319
alpha 5733 6429 5225
nmode - - -
stat 3464 4018 3597
statalpha - - -
rrent 6395 7131 7131
meanrr 1596 1605 1675

Based on these results, the three chains were almost immediately soft-stopped (fastest1ext: 10:18 AM, fastest2ext: 10:21 AM, fastest3ext: 10:22 AM) at 17217, 19235, and 16583 generations (“points”), respectively. The “largest3ext” chain was soft-stopped on Sunday 1/15/2017 at 27463 generations. The resulting trees were then examined as described above, with the burnin fraction again set equal to one tenth of the total length of the chain and the subsampling rate estimated at 50 samples. For comparison, the longest autocorrelation times estimated by Tracer were equal to 33.74 for fastest1ext, 18.38 for fastest2ext, and 46.63 for fastest3ext. An exception was made for the “largest3ext” chain, where an examination in Tracer suggested the presence of autocorrelation times of up to 120.58, and where the subsampling frequency was therefore raised to 1 sample per 125 generation, while the burnin proportion was lowered to 2463 generations (~9% of the total chain length) in order to keep the number of samples at a minimum of 200. All traces showed the same features as those of the first set of chains, i.e., no clearly defined burnin and an overall “white noise” pattern.

../readdiv -x 1722 50 fastest1ext
../readdiv -x 1924 50 fastest2ext
../readdiv -x 1658 50 fastest3ext
../readdiv -x 2463 125 largest3ext
fastest1ext PhyloBayes time tree

fastest1ext PhyloBayes time tree

fastest2ext PhyloBayes time tree

fastest2ext PhyloBayes time tree

fastest3ext PhyloBayes time tree

fastest3ext PhyloBayes time tree

largest3ext PhyloBayes time tree

largest3ext PhyloBayes time tree

Calibration overflows (italics) and underflows (bold):

Calibration Lower bound Upper bound Node # Fastest1 Fastest2 Fastest3 Largest3
Root 98.0 143.0 118 139.862 139.168 139.838 99.2247
Hoplopteryx 93.9 128.0 129 128.028 127.967 127.744 96.5496
Calatomus 11.9 63.3 210 31.1886 27.3302 31.8635 62.8299
Triodon 50.5 69.6 230 57.9151 55.1725 57.4373 61.6498
Archaeotetraodon 32.02 58.5 231 21.3779 20.7734 23.018 60.5811
Tarkus 49.0 61.6 229 49.7678 51.1516 49.691 59.8781
Luvarus 54.17 80.8 219 75.079 72.9126 74.6186 65.222
Proacanthurus 49.0 69.3 222 52.0107 52.6224 52.8573 63.9612
Chaetodontidae 29.62 66.0 220 71.7011 68.176 70.189 63.5144
Archaeus 54.17 80.8 193 56.3517 56.7469 57.0398 64.9513
Eastmanalepes 49.0 69.3 195 52.3301 51.6484 52.4744 63.9605
Mene 55.2 94.6 192 71.74 71.7088 69.5679 62.9523
Eobothus 49.0 79.7 186 61.4272 59.273 65.103 64.218
Eobuglossus 41.2 66.6 189 43.6976 42.3343 44.8876 62.4628
Sphyraena 49.0 79.7 183 73.4988 75.6311 76.2236 65.1913
Rhamphexocoetus 49.0 79.7 173 75.3851 72.8168 70.1643 64.3795
Mahengechromis 45.46 67.4 174 65.1868 63.6862 64.5467 62.2615
Gasterorhamphosus 69.71 111.3 154 80.6308 81.4773 79.0943 66.9342
Prosolenostomus 49.0 93.7 155 74.4455 77.8095 76.7663 65.1738
Eocoelopoma 54.17 94.2 162 59.956 68.1011 60.4741 64.2698
Berybolcensis 49.0 93.7 132 66.4465 71.7619 69.9133 77.9071
Aipichthys 98.0 143.0 119 134.434 132.915 131.604 98.0569
Cretzeus 69.71 111.3 125 108.019 107.643 107.209 96.578
Rhinocephalus 53.7 94.4 128 54.7493 54.2768 54.0105 77.8132
Zenopsis 32.02 91.4 126 45.4801 42.5645 43.8688 62.3619
Homonotichthys 93.9 128.0 123 122.279 123.691 123.789 96.6863
Massamorichthys 58.551 109.6 124 41.5663 50.3007 40.3652 94.9388
Turkmene 54.17 122.5 120 74.1299 70.6346 65.7669 96.0469
Trachipterus 5.33 94.0 121 22.3674 22.095 30.3064 93.1854

Note that the complete information on overflows and underflows (reporting percentages of the 95% HPDs that extend beyond the bounds rather than just the means) is stored in the _sample.calib file generated by readdiv.

Analysis Protocol. Phase 5 – Corrected Calibrations

Due to the omission of an outgroup in the sequence used to calculate the upper bound on the Homonotichthys calibration (assigned to the (Polymixia + Aphredoderus) node), a new calibration file had to be created to correct for the error:

##                         C1                      C2     C3    C4
## 1                       13                             NA    NA
## 2         lampris_guttatus     chaetodon_ocellatus 143.00 98.00
## 3         lampris_guttatus    aphredoderus_sayanus 128.82 98.00
## 4          polymixia_lowei    aphredoderus_sayanus 116.35 93.90
## 5    percopsis_omiscomycus    aphredoderus_sayanus  93.51 63.10
## 6   sargocentron_coruscum2 myripristis_leiognathus 109.29 49.00
## 7         scomber_scombrus scomberomorus_maculatus  95.58 54.17
## 8         pomacanthus_paru     chaetodon_ocellatus  59.26 29.62
## 9      epibulus_insidiator      scarus_ferrigineus  43.95 11.90
## 10   ogcocephalus_radiatus    cryptopsaras_couesii  53.93 49.00
## 11      xenentodon_cancila   pterophyllum_leopoldi  80.52 49.00
## 12      parachirus_xenicus  poecilopsetta_plinthus  53.88 41.20
## 13 istiophorus_platypterus          mene_maculatus  95.64 55.20
## 14          seriola_zonata          alepes_kleinii  61.61 49.00

This file was then used to re-run all of the 5 calibrated PhyloBayes analyses summarized above. As in Phase 4, the settings were left unchanged except for the calibration file replacement:

../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0018922 0.00148774 -cal calib_root_corrected -sb -gtr -dgam 4 fastest1corr
../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00141146 0.00153467 -cal calib_root_corrected -sb -gtr -dgam 4 fastest2corr
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00195453 0.00148684 -cal calib_root_corrected -sb -gtr -dgam 4 fastest3corr
../pb -d 4ff6dce0b7cebc9428b4b77e79356484.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0172977 0.000000000148854 -cal calib_root_corrected -sb -gtr -dgam 4 largest1corr
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0162411 0.000000000293377 -cal calib_root_corrected -sb -gtr -dgam 4 largest3corr

The analyses were soft-stopped on 02/11/2017 at 9:37 PM (fastest1corr: 16899 generations), 9:43 PM (fastest2corr: 18923 generations), and 9:45 PM (fastest3corr: 16254 generations); and on 03/05/2017 at 9:46 PM (largest3corr: 56023 generations). The examination of the resulting trace files in Trace yielded the following effective sample sizes for the sampled parameters:

parameter fastest1corr fastest2corr fastest3corr largest3corr
states 16899 18923 16254 56023
loglik 4522 7054 6033 6184
length 3561 3499 4370 9741
sigma 7102 5885 4887 314
mu 1242 1573 1081 2571
meanrates 1680 1676 1091 973
p1 - - - -
p2 - - - -
scale 2246 3170 3215 206
alpha 5488 6051 5488 43110
nmode - - - -
stat 3779 4471 3511 23788
statalpha - - - -
rrent 6821 6647 6616 25497
meanrr 1070 1540 1396 3791
kappa - - - -
allocent - - - -

The longest autocorrelation times found using Tracer were equal to 14.22 for fastest1corr (meanrr), 11.06 (meanrr) for fastest2corr, 13:54 (mu) for fastest3corr, and 244.66 (scale) for largest3corr, respectively. Based on these values, the chains were summarized as follows:

../readdiv -x 1690 50 fastest1corr
../readdiv -x 1892 50 fastest2corr
../readdiv -x 1625 50 fastest3corr
../readdiv -x 5602 250 largest3corr
fastest1corr PhyloBayes time tree

fastest1corr PhyloBayes time tree

fastest2corr PhyloBayes time tree

fastest2corr PhyloBayes time tree

fastest3corr PhyloBayes time tree

fastest3corr PhyloBayes time tree

largest3corr PhyloBayes time tree

largest3corr PhyloBayes time tree

Calibration overflows (italics) and underflows (bold):

Calibration Lower bound Upper bound Node # fastest1 fastest2 fastest3 largest3
Aipichthys 98.0 128.82 119 126.783 125.313 122.758 97.0571
Berybolcensis 49.0 109.29 132 78.2571 80.9249 82.0382 56.6446
Calatomus 11.9 43.95 210 26.6171 23.4019 25.7685 45.0734
Chaetodontidae 29.62 59.26 220 59.9596 60.5989 59.167 54.7753
Eobuglossus 41.2 53.88 189 43.5713 42.2855 46.3462 53.205
Eucoelopoma 54.17 95.58 162 59.3606 67.9218 62.5009 55.0984
Homonotichthys 93.9 116.35 123 114.716 115.321 114.571 96.0677
Mcconichthys 63.1 93.51 124 46.7736 54.0181 43.2484 93.7347
Ramphexocoetus 49.0 80.52 173 77.5816 75.2246 76.3504 54.7811
Tarkus 49.0 53.93 229 49.6754 50.0795 49.7156 53.1321
Mene 55.2 95.64 192 72.717 72.109 72.1927 54.8223
Eastmanelepes 49.0 61.61 195 52.6179 51.0957 52.9884 54.8232

The complete information abou overflows and underflows is stored in the corresponding _sample.calib files.

BEAST Analysis

Since the analysis is to be run on a fixed topology, it was not necessary to enforce monophyly in the “Taxa” tab of BEAUTi.

No tip dates or traits have been specified in the XML file. The substitution model was set to GTR+\(\Gamma\) with estimated base frequencies and the \(\Gamma\) model of site heterogeneity discretized into 4 rate categories. No codon partitioning was used. A lognormal uncorrelated relaxed clock was selected as the clock model (without continuous quantile parameterization).

Calculating the calibration priors so that the given upper bound correspond to the 95th percentile of an exponential distribution offset by the lower bound:

\[ \text{mean} = \frac{1}{\lambda} = \frac{\text{upper bound} - \text{offset}}{\text{95th percentile}} \]

## [1] "Aipichthys = 10.2879687454302"
## [1] "Berybolcensis = 20.1252964199217"
## [1] "Calatomus = 10.6985528322855"
## [1] "Chaetodontidae = 9.8940750686097"
## [1] "Eobuglossus = 4.23268798481684"
## [1] "Eocoelopoma = 13.8229975907938"
## [1] "Homonotichthys = 11.3962119717387"
## [1] "Mcconichthys = 10.1511073831451"
## [1] "Ramphexocoetus = 10.5216344859169"
## [1] "Tarkus = 1.645674429428"
## [1] "Mene = 13.4992036361193"
## [1] "Eastmanelepes = 4.20932141076816"

In order to keep the tree topology constant, all the rearrangements operators (subtreeSlide, narrowEXchange, wideExchange, wilsonBalding, as well as subtreeLeap, which is off by default) were deactivated in BEAUTi. All the remaining operators were kept at their default values.

(Note: Unchecking the topology operators manually simply leads to BEAUTi reverting to the original setup. When that happens, the operators must be deleted directly from the XML file. However, it is is possible to prevent this by selecting “fixed tree topology” in the “Operator mix” drop-down menu.)

The length of the chain was set equal to 50 million generations, sampling every 1000 generations.

beast -threads 12 -beagle_SSE fastest1.xml
beast -threads 12 -beagle_SSE fastest2.xml
beast -threads 12 -beagle_SSE fastest3.xml
beast -threads 12 -beagle_SSE largest1.xml
beast -threads 12 -beagle_SSE largest2.xml
beast -threads 12 -beagle_SSE largest3.xml

Progress check: Friday 11/25/2016, 5:35 PM (fastest1), 5:44 PM (fastest2), 6:06 PM (fastest3), 6:07 PM (largest1), 6:09 PM (largest2), 6:10 PM (largest3)

(The order-of-magnitude discrepancies are caused by the fact that different analyses were started on different days.)

Progress check: Friday 12/02/2016, 5:36 PM (fastest1run2), 5:39 (fastest2run2), 5:42 (fastest3), 6:02 (largest1), 6:03 (largest2), 6:06 (largest3)

Progress check: Wednesday 12/14/2016, 4:48 PM (fastest1run2), 4:49 (fastest2run2), 4:50 (fastest3run2), 4:52 (largest1), 4:53 (largest2), 4:54 (largest3)

Progress check: Monday 12/19/2016, 11:16 AM (fastest1run2), 11:18 AM (fastest2run2), 11:36 AM (fastest3run2), 11:38 AM (largest1), 11:40 AM (largest2), 10:11 AM (largest3)

Progress check: Sunday 01/01/2017, 09:37 AM (fastest1run2), 09:38 AM (fastest3run2), 9:40 AM (largest1), 09:45 AM (largest2)

Progress check: Tuesday 01/10/2017, 8:30 PM (fastest1run2), 8:36 PM (fastest3run2), 8:38 PM (largest1), 8:40 PM (largest2)

Progress check: Wednesday 01/11/2017, 10:58 AM (fastest3run2), 10:56 AM (largest1), 11:05 AM (largest2)

Progress check: Monday 01/16/2017, 9:41 PM (fastest3run2), 9:44 PM (largest2)

Progress check: Tuesday 02/07/2017, 01:10 PM (largest2)

Operator analysis – fastest1

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.83 660747 12725277 19.26 0.2311
scale(ag) 0.883 661295 12735820 19.26 0.1791
scale(at) 0.841 659766 12709519 19.26 0.2183
scale(cg) 0.853 659551 12696596 19.25 0.2257
scale(gt) 0.823 659544 12715860 19.28 0.2141
frequencies 0.013 659030 12690144 19.26 0.239
scale(alpha) 0.703 658986 12667283 19.22 0.2383
scale(ucld.mean) 0.937 1979321 38020330 19.21 0.1951
scale(ucld.stdev) 0.949 1979589 30209713 15.26 0.1781
scale(treeModel.rootHeight) 0.887 1978926 2525621 1.28 0.2174
uniform(nodeHeights(treeModel)) 19799523 66403294 3.35 0.1446
scale(birthDeath.meanGrowthRate) 0.553 1980489 313550 0.16 0.2424
scale(birthDeath.relativeDeathRate) 0.151 1981607 301348 0.15 0.2699
up:ucld.mean down:nodeHeights(treeModel) 0.962 1979956 23097102 11.67 0.2333
swapOperator(branchRates.categories) 6600069 26808160 4.06 0.0911
uniformInteger(branchRates.categories) 6601601 20645637 3.13 0.1637

Operator analysis – fastest2

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.82 660158 8672488 13.14 0.2296
scale(ag) 0.887 659057 8662889 13.14 0.2116
scale(at) 0.835 660445 8684486 13.15 0.226
scale(cg) 0.848 660260 8685804 13.16 0.2234
scale(gt) 0.821 659789 8666099 13.13 0.2262
frequencies 0.015 659293 8669795 13.15 0.2336
scale(alpha) 0.661 661119 8666069 13.11 0.2407
scale(ucld.mean) 0.934 1980229 25974357 13.12 0.2012
scale(ucld.stdev) 0.946 1979266 22205332 11.22 0.1939
scale(treeModel.rootHeight) 0.89 1978597 1821766 0.92 0.2478
uniform(nodeHeights(treeModel)) 19807202 46560273 2.35 0.1559
scale(birthDeath.meanGrowthRate) 0.557 1982184 293504 0.15 0.2459
scale(birthDeath.relativeDeathRate) 0.147 1979410 281725 0.14 0.2628
up:ucld.mean down:nodeHeights(treeModel) 0.962 1981765 15764484 7.95 0.2316
swapOperator(branchRates.categories) 6595972 18805478 2.85 0.1027
uniformInteger(branchRates.categories) 6595254 14463753 2.19 0.1834

Operator analysis – fastest3

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.839 661439 26785582 40.5 0.2199
scale(ag) 0.89 659014 26717199 40.54 0.1878
scale(at) 0.836 662035 26830436 40.53 0.2104
scale(cg) 0.859 659911 26733550 40.51 0.2284
scale(gt) 0.833 659782 26725992 40.51 0.2259
frequencies 0.013 659708 26711803 40.49 0.239
scale(alpha) 0.673 660367 26708947 40.45 0.2404
scale(ucld.mean) 0.939 1979483 80118681 40.47 0.1863
scale(ucld.stdev) 0.953 1978891 63173044 31.92 0.1838
scale(treeModel.rootHeight) 0.882 1980427 5096614 2.57 0.2331
uniform(nodeHeights(treeModel)) 19799578 136515823 6.89 0.1485
scale(birthDeath.meanGrowthRate) 0.548 1979678 529734 0.27 0.2361
scale(birthDeath.relativeDeathRate) 0.147 1980545 505737 0.26 0.261
up:ucld.mean down:nodeHeights(treeModel) 0.962 1978362 48494033 24.51 0.2346
swapOperator(branchRates.categories) 6601937 55296150 8.38 0.0903
uniformInteger(branchRates.categories) 6598843 42006022 6.37 0.1676

Operator analysis – largest1

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.867 658820 183774525 278.94 0.2188
scale(ag) 0.899 658719 183783261 279.0 0.1982
scale(at) 0.866 660382 184133731 278.83 0.2087
scale(cg) 0.877 661270 184529128 279.05 0.2193
scale(gt) 0.866 659165 183948566 279.06 0.2158
frequencies 0.01 659218 183698826 278.66 0.2376
scale(alpha) 0.674 661266 184561183 279.1 0.2436
scale(ucld.mean) 0.95 1980189 552340540 278.93 0.1808
scale(ucld.stdev) 0.984 1980148 291959479 147.44 0.0686
scale(treeModel.rootHeight) 0.969 1979815 26753339 13.51 0.1436
uniform(nodeHeights(treeModel)) 19797336 969520273 48.97 0.4089
scale(birthDeath.meanGrowthRate) 0.553 1979627 421092 0.21 0.2403
scale(birthDeath.relativeDeathRate) 0.151 1983201 415925 0.21 0.2688
up:ucld.mean down:nodeHeights(treeModel) 0.964 1979966 338319988 170.87 0.2447
swapOperator(branchRates.categories) 6598919 397530566 60.24 0.191
uniformInteger(branchRates.categories) 6601959 303947951 46.04 0.279

Operator analysis – largest2

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.212 659982 506771328 767.86 0.0732
scale(ag) 0.223 660596 509593306 771.41 0.0752
scale(at) 0.216 659540 509270896 772.16 0.0718
scale(cg) 0.213 659038 573688140 870.49 0.0695
scale(gt) 0.21 660452 510072660 772.31 0.0695
frequencies 0.005 660389 511310855 774.26 0.0562
scale(alpha) 0.187 658735 187479801 284.61 0.1493
scale(ucld.mean) 0.303 1979260 563090279 284.5 0.1712
scale(ucld.stdev) 0.146 1981161 460653492 232.52 0.2578
scale(treeModel.rootHeight) 0.647 1979746 30696821 15.51 0.2509
uniform(nodeHeights(treeModel)) 19802437 879144583 44.4 0.7514
scale(birthDeath.meanGrowthRate) 0.552 1980792 461126 0.23 0.2559
scale(birthDeath.relativeDeathRate) 0.165 1980368 355828 0.18 0.2458
up:ucld.mean down:nodeHeights(treeModel) 0.961 1979370 344807687 174.2 0.2508
swapOperator(branchRates.categories) 6599973 356873046 54.07 0.9937
uniformInteger(branchRates.categories) 6598161 274060739 41.54 0.9957

Operator analysis – largest3

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.819 659031 89206726 135.36 0.2316
scale(ag) 0.857 660868 89427641 135.32 0.2107
scale(at) 0.824 659432 89315593 135.44 0.2287
scale(cg) 0.821 659377 89299977 135.43 0.2338
scale(gt) 0.81 662027 89636024 135.4 0.2156
frequencies 0.015 660602 89473019 135.44 0.2348
scale(alpha) 0.617 659716 89281998 135.33 0.2415
scale(ucld.mean) 0.929 1980263 268234737 135.45 0.1957
scale(ucld.stdev) 0.978 1978061 146797457 74.21 0.1177
scale(treeModel.rootHeight) 0.953 1979592 13203639 6.67 0.1856
uniform(nodeHeights(treeModel)) 19806743 438928272 22.16 0.4735
scale(birthDeath.meanGrowthRate) 0.555 1979445 497254 0.25 0.2418
scale(birthDeath.relativeDeathRate) 0.157 1976384 490952 0.25 0.2814
up:ucld.mean down:nodeHeights(treeModel) 0.962 1980806 161401155 81.48 0.2281
swapOperator(branchRates.categories) 6598272 180768292 27.4 0.2154
uniformInteger(branchRates.categories) 6599381 136558073 20.69 0.3281

Operator analysis – fastest1run2

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.829 7259872 169113074 23.29 0.2283
scale(ag) 0.901 7261054 169072111 23.28 0.2226
scale(at) 0.843 7262718 169158813 23.29 0.2216
scale(cg) 0.855 7256664 169082966 23.3 0.2304
scale(gt) 0.833 7267287 169272563 23.29 0.2306
frequencies 0.014 7258025 169035142 23.29 0.237
scale(alpha) 0.705 7257684 168809560 23.26 0.2404
scale(ucld.mean) 0.941 21781779 506603278 23.26 0.2095
scale(ucld.stdev) 0.952 21776519 412252595 18.93 0.1948
scale(treeModel.rootHeight) 0.888 21769604 32485229 1.49 0.2224
uniform(nodeHeights(treeModel)) 217805411 852096133 3.91 0.1454
scale(birthDeath.meanGrowthRate) 0.554 21782949 4248737 0.2 0.2431
scale(birthDeath.relativeDeathRate) 0.146 21778539 4047575 0.19 0.2591
up:ucld.mean down:nodeHeights(treeModel) 0.962 21778069 305910176 14.05 0.2328
swapOperator(branchRates.categories) 72601538 343235335 4.73 0.0909
uniformInteger(branchRates.categories) 72602288 263832702 3.63 0.1639

Operator analysis – fastest2run2

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.819 659031 89206726 135.36 0.2316
scale(ag) 0.857 660868 89427641 135.32 0.2107
scale(at) 0.824 659432 89315593 135.44 0.2287
scale(cg) 0.821 659377 89299977 135.43 0.2338
scale(gt) 0.81 662027 89636024 135.4 0.2156
frequencies 0.015 660602 89473019 135.44 0.2348
scale(alpha) 0.617 659716 89281998 135.33 0.2415
scale(ucld.mean) 0.929 1980263 268234737 135.45 0.1957
scale(ucld.stdev) 0.978 1978061 146797457 74.21 0.1177
scale(treeModel.rootHeight) 0.953 1979592 13203639 6.67 0.1856
uniform(nodeHeights(treeModel)) 19806743 438928272 22.16 0.4735
scale(birthDeath.meanGrowthRate) 0.555 1979445 497254 0.25 0.2418
scale(birthDeath.relativeDeathRate) 0.157 1976384 490952 0.25 0.2814
up:ucld.mean down:nodeHeights(treeModel) 0.962 1980806 161401155 81.48 0.2281
swapOperator(branchRates.categories) 6598272 180768292 27.4 0.2154
uniformInteger(branchRates.categories) 6599381 136558073 20.69 0.3281

Operator analysis – fastest3run2

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.842 6599217 163846742 24.83 0.2268
scale(ag) 0.902 6602838 163951771 24.83 0.2182
scale(at) 0.846 6600926 163866015 24.82 0.2285
scale(cg) 0.855 6602336 163899948 24.82 0.2184
scale(gt) 0.831 6601330 163881819 24.83 0.2228
frequencies 0.014 6596966 163790684 24.83 0.2362
scale(alpha) 0.674 6598637 163636433 24.8 0.2413
scale(ucld.mean) 0.942 19804154 490972911 24.79 0.1999
scale(ucld.stdev) 0.957 19793283 382899915 19.34 0.203
scale(treeModel.rootHeight) 0.883 19797648 31382714 1.59 0.2278
uniform(nodeHeights(treeModel)) 197999369 809532330 4.09 0.1481
scale(birthDeath.meanGrowthRate) 0.553 19798376 3563231 0.18 0.241
scale(birthDeath.relativeDeathRate) 0.144 19805473 3413702 0.17 0.2545
up:ucld.mean down:nodeHeights(treeModel) 0.962 19797881 296985830 15.0 0.2332
swapOperator(branchRates.categories) 65993010 327338784 4.96 0.0904
uniformInteger(branchRates.categories) 66008556 252144290 3.82 0.1674

Tracer effective sample sizes (ESS)

parameter fastest1 fastest2 fastest3 largest1 largest2 largest3
likelihood 3443 4600 3535 77 13 1984
treeModel.rootHeight 29 61 36 32 75 35
tmrca(Aipichthys) 40 96 26 137 475 36
tmrca(Berybolcensis) 204 229 221 57 8034 259
tmrca(Calatomus) 28 38 31 59 723 384
tmrca(Chaetodontidae) 62 92 87 132 8185 498
tmrca(Eobuglossus) 303 255 193 47 5676 174
tmrca(Eucoelopoma) 170 281 196 84 3149 370
tmrca(Homonotichthys) 29 67 214 133 4325 437
tmrca(Mcconichthys) 57 282 94 42 9705 40
tmrca(Ramphexocoetus) 167 94 86 158 6373 238
tmrca(Tarkus) 1278 1671 999 21 3706 447
tmrca(Mene) 156 106 247 313 2225 565
tmrca(Eastmanelepes) 318 722 393 660 5607 3173
birthDeath.meanGrowthRate 48 235 87 141 25 246
birthDeath.relativeDeathRate 9717 2151 1954 22290 210 35211
ucld.mean 17 37 19 12 28 21
ucld.stdev 102 136 47 12802 70 16142
meanRate 19 51 26 19 28 48
coefficientOfVariation 53 27 66 62 69 118
covariance 55 22 59 3270 42771 1448
speciation 18 51 26 19 25 47

Based on these values, three new XLM files were generated using BEAUTi for additional runs. The settings were exactly identical except for the length of the chain, which was set to 550 million generations for fastest1, 450 million generations for fastest2, and 500 million generations for fastest3. The lengths were chosen so that even the parameter with fewest samples would reach an ESS of 200 in both chains, under the conservative assumption that ESS increases linearly with chain length.

Since neither the posterior trace nor the likelihood trace showed a clearly delimited burnin stage when examined in Tracer, the default burnin proportion of 1/10 was reduced to 5 million generations, the same as for the first run. This fraction was then removed from both chains as burnin in the process of combining them using LogCombiner:

java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 fastest1.log fastest1-run2.log fastest1-combined.log
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 fastest2.log fastest2-run2.log fastest2-combined.log
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 fastest3.log fastest3-run2.log fastest3-combined.log

The combined files were then re-downloaded and analyzed in Tracer on the local machine in addition to being analyzed on the cluster using LogAnalyser:

java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 0 -ess fastest1-combined.log
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 0 -ess fastest2-combined.log
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 0 -ess fastest3-combined.log

fastest1

parameter run1 (burnin = 5e6) run2 (burnin = 5e6) combined (Tracer) combined (LogAnalyser)
posterior 38 502 534 534
prior 22 283 302 302
likelihood 3443 31955 35609 35609
treeModel.rootHeight 29 305 332 332
tmrca(Aipichthys) 40 468 504 123
tmrca(Berybolcensis) 204 1983 2196 2196
tmrca(Calatomus) 28 277 303 303
tmrca(Chaetodontidae) 62 757 837 837
tmrca(Eobuglossus) 303 3018 3141 3141
tmrca(Eucoelopoma) 170 1712 1856 1856
tmrca(Homonotichthys) 29 465 489 489
tmrca(Mcconichthys) 57 673 733 733
tmrca(Ramphexocoetus) 167 730 803 803
tmrca(Tarkus) 1278 14121 14736 14736
tmrca(Mene) 156 1393 1557 1557
tmrca(Eastmanelepes) 318 5114 5366 5366
birthDeath.meanGrowthRate 48 584 626 626
birthDeath.relativeDeathRate 9717 12780 14498 14498
ucld.mean 17 245 262 262
ucld.stdev 102 711 783 783
meanRate 19 241 258 258
coefficientOfVariation 53 294 324 324
covariance 55 384 429 429
speciation 18 241 258 258

fastest2

parameter run1 (burnin = 5e6) run2 (burnin = 5e6) combined (Tracer) combined (LogAnalyser)
posterior 146 482 560 560
prior 66 270 309 309
likelihood 4600 38026 42378 42378
treeModel.rootHeight 61 256 289 289
tmrca(Aipichthys) 96 687 773 773
tmrca(Berybolcensis) 229 2212 2477 2477
tmrca(Calatomus) 38 309 340 340
tmrca(Chaetodontidae) 92 933 1024 1024
tmrca(Eobuglossus) 255 4125 4186 4186
tmrca(Eucoelopoma) 281 2536 2819 2819
tmrca(Homonotichthys) 67 603 669 669
tmrca(Mcconichthys) 282 810 936 936
tmrca(Ramphexocoetus) 94 696 781 781
tmrca(Tarkus) 1671 14391 15934 15934
tmrca(Mene) 106 1476 1572 1572
tmrca(Eastmanelepes) 722 6860 7598 7598
birthDeath.meanGrowthRate 235 521 608 608
birthDeath.relativeDeathRate 2151 12258 13525 13525
ucld.mean 37 211 238 2.691E-3
ucld.stdev 136 616 712 712
meanRate 51 220 248 2.1962E-3
coefficientOfVariation 27 229 249 249
covariance 22 437 437 437
speciation 51 222 251 251

fastest3

parameter run1 (burnin = 5e6) run2 (burnin = 5e6) combined (Tracer) combined (LogAnalyser)
posterior 85 617 670 670
prior 37 310 336 336
likelihood 3535 38339 41780 41780
treeModel.rootHeight 36 330 357 357
tmrca(Aipichthys) 26 470 481 481
tmrca(Berybolcensis) 221 2280 2503 2503
tmrca(Calatomus) 31 316 347 347
tmrca(Chaetodontidae) 87 650 739 739
tmrca(Eobuglossus) 193 1951 2169 2169
tmrca(Eucoelopoma) 196 1312 1451 1451
tmrca(Homonotichthys) 214 370 384 384
tmrca(Mcconichthys) 94 806 248 248
tmrca(Ramphexocoetus) 86 946 1025 1025
tmrca(Tarkus) 999 10574 11611 11611
tmrca(Mene) 247 1348 1518 1518
tmrca(Eastmanelepes) 393 4983 5321 5321
birthDeath.meanGrowthRate 87 720 778 778
birthDeath.relativeDeathRate 1954 15661 16300 16300
ucld.mean 19 242 255 255
ucld.stdev 47 600 633 633
meanRate 26 254 275 275
coefficientOfVariation 66 270 291 291
covariance 59 256 291 291
speciation 26 254 275 275

As can be seen, the ESS values reported by Tracer and LogAnalyser are identical for most parameters. All the parameters for which the results of the two programs differ exhibit effective sample sizes that exceed 200 in Tracer but do not reach this threshold in LogAnalyser. Note that the parameter means are estimated identically by both programs even in these cases. Here, it is assumed that the conflicting values are reported incorrectly by LogAnalyser, particularly with regard to the fact that the ESS values shown for the ucld.mean and meanRate parameters of the fastest2 partition imply unrealistically large autocorrelation times on the order of \(10^{11}\) and are four orders of magnitude lower than the sum of the ESS values obtained from the partial analyses (i.e., in runs 1 and 2).

Therefore, the analyses were regarded as completed, and the respective .trees files from each pair of runs were combined under the same settings as the log files above:

java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 -trees fastest1.trees fastest1-run2.trees fastest1-combined.trees
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 -trees fastest2.trees fastest2-run2.trees fastest2-combined.trees
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 -trees fastest3.trees fastest3-run2.trees fastest3-combined.trees

The results were then summarized using TreeAnnotator:

java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 0 -heights median fastest1-combined.trees fastest1.tre
java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 0 -heights median fastest2-combined.trees fastest2.tre
java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 0 -heights median fastest3-combined.trees fastest3.tre
fastest2 BEAST time tree

fastest2 BEAST time tree

fastest2 BEAST time tree fastest2 BEAST time tree

Analysis Protocol. Phase 2 – Root Calibration

Unlike PhyloBayes and mcmcTREE, BEAST does not require the user to place a calibration on the root of the tree. Since no root calibration was included among the 12 fossil calibrations listed and justified in Alfaro et al. (in prep.; see Supplemental Figure S3), this option was made use of, resulting in trees with excessively old deep nodes. To correct for the unrealistic age estimates, a second round of analyses was launched with a root calibration added to the remaining 12. As in the PhyloBayes and mcmcTREE analyses, the lower bound on the root age was set equal to 98 Ma, with the corresponding upper bound at 143 Ma. For the BEAST analyses, the minimum was treated as hard and the maximum as soft, albeit slightly “harder” than the maxima placed on the remaining calibrations. This was achieved by treating it as the 97.5th rather than 95th percentile:

root <- (143 - 98)/qexp(0.975)
paste("Root", root, sep = " = ")
## [1] "Root = 12.1988263806818"

As seen above, the resulting density has the desirable property of being more heavy-tailed than the density assigned to the second oldest calibration in the tree (the Lampris/Aphredoderus node calibrated by Aipichthys). The Lampris/Aphredoderus clade age is therefore relatively more likely to take on values closer to the minimum of 98 Ma, while the root age has a relatively higher probability of being closer to the maximum of 143 Ma. This conforms to the biological requirement that ancestors be older than their descendants.

Aside from the introduction of a root calibration, the only difference between the first and second round of BEAST analyses concerned operator settings. Since ucld.mean and ucld.stdev were among the least well-sampled parameters in the first round of analyses, their operators received an increased tuning setting of 0.9 (default value: 0.75) and an increased weight of 6.0 (default value: 3.0) in the second round. These changes only concern the behavior of the MCMC sampler, and therefore do not affect the comparability of the resulting estimates.

Based on the autocorrelation times observed in the first round of analyses, the length of the chain was set to 500 million generations for all the three analyzed partitions (fastest1, fastest2, fastest3), with the sampling frequency kept at 1 sample per 1000 generations. The analyses were launched using the following commands on Friday 01/27/2017, 8:43 PM (fastest1rootcal), 8:47 PM (fastest2rootcal), 8:49 PM (fastest3rootcal):

beast -threads 12 -beagle_SSE fastest1rootcal.xml
beast -threads 12 -beagle_SSE fastest2rootcal.xml
beast -threads 12 -beagle_SSE fastest3rootcal.xml

The chains reached their target length after 23.749 days (fastest1rootcal), 18.487 days (fastest2rootcal), and 24.822 days (fastest3rootcal), with the operator analyses yielding the results below:

Operator analysis – fastest1rootcal

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.824 6116198 69919813 11.43 0.2199
scale(ag) 0.899 6113223 69888056 11.43 0.2172
scale(at) 0.841 6113975 69908670 11.43 0.2194
scale(cg) 0.858 6109667 69851372 11.43 0.2353
scale(gt) 0.833 6112697 69886839 11.43 0.2297
frequencies 0.014 6111604 69872193 11.43 0.2315
scale(alpha) 0.698 6110480 69761039 11.42 0.2322
scale(ucld.mean) 0.944 36669057 418799547 11.42 0.2274
scale(ucld.stdev) 0.956 36666945 364960853 9.95 0.222
scale(treeModel.rootHeight) 0.921 18325612 14342138 0.78 0.2226
uniform(nodeHeights(treeModel)) 183322777 350692465 1.91 0.1475
scale(birthDeath.meanGrowthRate) 0.55 18332873 2494408 0.14 0.2376
scale(birthDeath.relativeDeathRate) 0.143 18337848 2409742 0.13 0.2535
up:ucld.mean down:nodeHeights(treeModel) 0.966 18326956 127184168 6.94 0.233
swapOperator(branchRates.categories) 61106415 143560830 2.35 0.0917
uniformInteger(branchRates.categories) 61123673 109593515 1.79 0.1659

Operator analysis – fastest2rootcal

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.817 6107430 52648293 8.62 0.2244
scale(ag) 0.889 6109772 52678774 8.62 0.2152
scale(at) 0.831 6107465 52661821 8.62 0.219
scale(cg) 0.85 6113335 52704317 8.62 0.2274
scale(gt) 0.821 6114513 52715590 8.62 0.2258
frequencies 0.015 6109295 52679348 8.62 0.2336
scale(alpha) 0.656 6115332 52583843 8.6 0.236
scale(ucld.mean) 0.941 36666356 315470882 8.6 0.2295
scale(ucld.stdev) 0.952 36666448 273498416 7.46 0.2223
scale(treeModel.rootHeight) 0.912 18337128 11233196 0.61 0.2175
uniform(nodeHeights(treeModel)) 183335452 287083670 1.57 0.1571
scale(birthDeath.meanGrowthRate) 0.552 18337883 2421391 0.13 0.2407
scale(birthDeath.relativeDeathRate) 0.149 18331103 2345343 0.13 0.265
up:ucld.mean down:nodeHeights(treeModel) 0.966 18337940 96210275 5.25 0.2343
swapOperator(branchRates.categories) 61108812 116351803 1.9 0.1018
uniformInteger(branchRates.categories) 61101736 89251300 1.46 0.1836

Operator analysis – fastest3rootcal

Operator Tuning Count Time Time/Op Pr(accept)
scale(ac) 0.845 6110036 71483642 11.7 0.2317
scale(ag) 0.902 6108468 71465827 11.7 0.217
scale(at) 0.844 6113742 71520575 11.7 0.2247
scale(cg) 0.853 6114443 71526662 11.7 0.2138
scale(gt) 0.834 6113813 71512263 11.7 0.2296
frequencies 0.014 6111004 71488987 11.7 0.2353
scale(alpha) 0.671 6108717 71335121 11.68 0.2385
scale(ucld.mean) 0.948 36662189 428234059 11.68 0.2308
scale(ucld.stdev) 0.96 36666124 335433111 9.15 0.2262
scale(treeModel.rootHeight) 0.916 18343791 16323942 0.89 0.2155
uniform(nodeHeights(treeModel)) 183319164 403954935 2.2 0.1489
scale(birthDeath.meanGrowthRate) 0.552 18337065 2549700 0.14 0.2398
scale(birthDeath.relativeDeathRate) 0.139 18326676 2460407 0.13 0.2464
up:ucld.mean down:nodeHeights(treeModel) 0.966 18335384 129993472 7.09 0.2319
swapOperator(branchRates.categories) 61120866 162668944 2.66 0.0901
uniformInteger(branchRates.categories) 61108518 126829868 2.08 0.1672

The results were examined in LogAnalyser with the following results:

java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 50000000 -ess fastest1rootcal.log
##                       statistic            mean      stdErr         ESS
## 1                     posterior -130727.8863000 15.13010000    607.4022
## 2                         prior   -1955.6506000 10.43390000    293.9210
## 3                    likelihood -128772.2357000 11.12130000  26500.6609
## 4          treeModel.rootHeight     180.9216000 18.34330000    304.8281
## 5             tmrca(Aipichthys)     134.3377000 14.85950000    462.6585
## 6          tmrca(Berybolcensis)      60.2411000 11.17370000   1399.6284
## 7              tmrca(Calatomus)      25.3585000  8.16650000    249.5550
## 8         tmrca(Chaetodontidae)      49.6873000 13.21410000    567.2447
## 9          tmrca(Eastmanelepes)      51.8293000  2.72680000   5125.4621
## 10           tmrca(Eobuglossus)      43.9373000  2.58150000   2286.4395
## 11           tmrca(Eucoelopoma)      57.0837000  2.97570000    988.3860
## 12        tmrca(Homonotichthys)     102.9297000  7.58240000    534.3246
## 13          tmrca(Mcconichthys)      67.9045000  4.47390000    605.2742
## 14                  tmrca(Mene)      63.3960000  6.54420000    911.6944
## 15        tmrca(Ramphexocoetus)      60.4203000  8.78960000    700.4769
## 16                tmrca(Tarkus)      50.1588000  1.15460000   6325.0426
## 17                  tmrca(Root)     180.9216000 18.34330000    304.8281
## 18    birthDeath.meanGrowthRate       0.0162000  0.00196880    885.6137
## 19 birthDeath.relativeDeathRate       0.0705000  0.06510000  94695.8974
## 20                           ac       0.1948000  0.00617880 215900.0000
## 21                           ag       0.9825000  0.02830000  81148.0984
## 22                           at       0.2465000  0.00746030 167540.0000
## 23                           cg       0.2477000  0.00699960 162220.0000
## 24                           gt       0.1918000  0.00620960 164810.0000
## 25                 frequencies1       0.2191000  0.00307170 113350.0000
## 26                 frequencies2       0.2796000  0.00348230  95910.2719
## 27                 frequencies3       0.2740000  0.00352670  96100.3316
## 28                 frequencies4       0.2273000  0.00307080 115190.0000
## 29                        alpha       8.5260000  0.46400000 408560.0000
## 30                    ucld.mean       0.0029742  0.00028735    221.1752
## 31                   ucld.stdev       0.9491000  0.03470000    706.5774
## 32                     meanRate       0.0023827  0.00015022    246.5243
## 33       coefficientOfVariation       1.1973000  0.08460000    270.5419
## 34                   covariance       0.1959000  0.06250000    282.6541
## 35               treeLikelihood -128772.2357000 11.12130000  26500.6609
## 36                   speciation    -596.8042000  7.41600000    246.6509
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 50000000 -ess fastest2rootcal.log
##                       statistic            mean      stdErr         ESS
## 1                     posterior -106365.5556000 15.40380000    522.4109
## 2                         prior   -1957.5347000 11.08470000    269.3418
## 3                    likelihood -104408.0209000 10.99870000  37114.1447
## 4          treeModel.rootHeight     180.7353000 20.05020000    269.2439
## 5             tmrca(Aipichthys)     127.3942000 13.54010000    613.5138
## 6          tmrca(Berybolcensis)      60.0970000 10.90220000   1974.9382
## 7              tmrca(Calatomus)      23.5559000  7.51270000    311.7914
## 8         tmrca(Chaetodontidae)      46.7649000 12.11520000    916.3662
## 9          tmrca(Eastmanelepes)      51.6495000  2.57560000   5502.6245
## 10           tmrca(Eobuglossus)      43.7849000  2.50190000   3151.6457
## 11           tmrca(Eucoelopoma)      57.5543000  3.42390000   1856.2798
## 12        tmrca(Homonotichthys)     102.2346000  7.30590000    644.6914
## 13          tmrca(Mcconichthys)      68.5515000  5.09490000    414.6174
## 14                  tmrca(Mene)      63.1729000  6.34290000   1284.0863
## 15        tmrca(Ramphexocoetus)      58.6012000  7.93930000   1061.6739
## 16                tmrca(Tarkus)      50.2542000  1.22650000  12594.1819
## 17                  tmrca(Root)     180.7353000 20.05020000    269.2439
## 18    birthDeath.meanGrowthRate       0.0164000  0.00205960    736.8404
## 19 birthDeath.relativeDeathRate       0.0728000  0.06690000  57840.2356
## 20                           ac       0.2325000  0.00782330 215610.0000
## 21                           ag       0.9843000  0.03020000  86311.0508
## 22                           at       0.2831000  0.00914440 170380.0000
## 23                           cg       0.3228000  0.00954740 162640.0000
## 24                           gt       0.2314000  0.00795450 169400.0000
## 25                 frequencies1       0.2226000  0.00330600 123390.0000
## 26                 frequencies2       0.2674000  0.00365870  99894.3990
## 27                 frequencies3       0.2786000  0.00374410 103710.0000
## 28                 frequencies4       0.2313000  0.00338230 122530.0000
## 29                        alpha      10.5904000  0.69310000 403880.0000
## 30                    ucld.mean       0.0033058  0.00033704    242.3424
## 31                   ucld.stdev       0.9459000  0.03710000    508.3728
## 32                     meanRate       0.0026357  0.00018346    231.1054
## 33       coefficientOfVariation       1.1671000  0.09060000    242.4046
## 34                   covariance       0.1898000  0.06440000    331.9766
## 35               treeLikelihood -104408.0209000 10.99870000  37114.1447
## 36                   speciation    -595.0462000  8.20300000    229.6069
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 50000000 -ess fastest3rootcal.log

As the effective sample size of all parameters exceeded the threshold of 200, the tree distribution was summarized in TreeAnnotator using the following commands:

java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 50000000 -heights median fastest1rootcal.trees fastest1rootcal.tre
java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 50000000 -heights median fastest2rootcal.trees fastest2rootcal.tre
fastest1rootcal BEAST time tree

fastest1rootcal BEAST time tree

fastest2rootcal BEAST time tree

fastest2rootcal BEAST time tree

Analysis Protocol. Phase 3 – Random Local Clock

The BEAST analyses performed in this step differed from those described above in the following features: (1) a root calibration was added to the original calibration set (as in Phase 2); (2) the clock model was changed from the lognormally distributed relaxed clock to random local clocks; and (3) the upper bound of the Homonotichthys calibration was decreased to 116.35 Ma by specifying a new mean for the corresponding offset exponential prior:

## [1] "Homonotichthys = 7.49399410561025"

The analysis was run under the fixed topology operator setup, all of whose components were kept at their default values. The chain length was set equal to 500 million generations, with a sampling frequency of 1 sample per 1000 generations.

beast -threads 12 -beagle_SSE fastest1rlc.xml
beast -threads 12 -beagle_SSE fastest2rlc.xml
beast -threads 12 -beagle_SSE fastest3rlc.xml

PAML / mcmcTREE Analysis

As noted in the mcmcTREE manual, several prior settings are of special interest when creating the control file: RootAge, rgene_gamma, and sigma2_gamma. Following Alfaro et al., the following prior was placed on the age of the root:

RootAge = 'B(9.8, 14.3, 1e-300, 0.05)'

To compute rgene_gamma, the baseml program from the PAML package was used to infer a mean substitution rate under the assumption of a strict clock and an age of 113.02 Ma for the root. This age was obtained as the mean of an exponential distribution with an offset of 98 Ma and the 95th percentile equal to 143 Ma:

mean <- 98 + (143 - 98)/qexp(0.95)
mean
## [1] 113.0214

A modified version of the tree file was used for the baseml analysis, with a single point calibration assigned to the root, in order to arrive at an estimate of the absolute substitution rate:

118 1
(((lampris_guttatus,(zu_elongatus,regalecus_glesne)),((polymixia_lowei,(percopsis_omiscomycus,aphredoderus_sayanus)),((zeus_faber,zenopsis_conchifera),(stylephorus_chordatus,(coryphaenoides_rupestris,gadus_morhua))))),(anoplogaster_cornuta,((rondeletia_loricata,(sargocentron_coruscum2,(myripristis_violacea,myripristis_leiognathus))),((carapus_bermudensis,lepophidium_profundorum),(batrachoides_pacifici,((kurtus_gulliveri,(((apogon_lateralis,ostorhinchus_nigrofasciatus),(apogon_maculatus,phaeoptyx_pigmentaria)),(odontobutis_obscura,((mogurnda_adspersa,erotelis_smaragdus),(ophiocara_porocephala,((bathygobius_soporator,(valenciennea_strigata,ptereleotris_evides)),(gillichthys_seta,(evorthodus_minutus,periophthalmus_barbarus)))))))),((((syngnathus_fuscus,aulostomus_sp),(synchiropus_ocellatus,(dactyloptena_orientalis,(parupeneus_multifasciatus,pseudupeneus_maculatus)))),((chiasmodon_niger,kali_normani),((scomber_scombrus,scomberomorus_maculatus),(((taractes_asper,brama_japonica),(ruvettus_pretiosus,assurger_anzac)),(psenopsis_anomala,(pomatomus_saltatrix,(cubiceps_baxteri,psenes_cyanophrys))))))),((((xenentodon_cancila,(pholidichthys_leucotaenia,pterophyllum_leopoldi)),((plesiops_sp,(pseudochromis_flavivertex,pseudochromis_sankeyi)),(chromis_enchrysura,(gramma_loreto,(scartella_cristata,(labrisomus_nuchipinnis,emblemariopsis_sp)))))),((centropomus_medius,sphyraena_putnamae),((polydactylus_sexfilis,(citharoides_macrolepis,((hippoglossina_oblonga,bothus_pantherinus),(parachirus_xenicus,poecilopsetta_plinthus)))),((toxotes_jaculatrix,(istiophorus_platypterus,mene_maculatus)),((coryphaena_hippurus,rachycentron_canadum),(seriola_zonata,(alectis_indicus,(caranx_melampygus,(chloroscombrus_orqueta,alepes_kleinii))))))))),((cephalopholis_argus,(hypoplectrus_puela,(taenianotus_triacanthus,pseudanthias_squamipinnis))),(micropterus_salmoides,((pempheris_schomburgkii,(cheimarrichthys_fosteri,((thalassoma_ballieui,(halichoeres_poeyi,halichoeres_radiatus)),(epibulus_insidiator,(scarus_trispinosus,scarus_ferrigineus))))),(((pareques_acuminatus,micropogonias_undulatus),(lutjanus_goreensis,(haemulon_album,(haemulon_flavilineatum,haemulon_parra)))),(((pomacanthus_paru,(chaetodon_ocellatus,chaetodon_kleinii)),(naso_unicornis,(acanthurus_pyroferus,(acanthurus_bahianus,(acanthurus_bariene,acanthurus_olivaceus))))),(antigonia_capros,((antennarius_striatus,(ogcocephalus_radiatus,cryptopsaras_couesii)),(balistes_capriscus,(takifugu_occelatus,(tetraodon_suvatti,(canthigaster_margaritata,(canthigaster_rostrata,canthigaster_figueiredoi))))))))))))))))))))'@1.1302';

Baseml estimates the following substitution rates (per 1 time unit, i.e., 100 million years):

partition fastest1 largest1
rate 0.427164 0.023614

These values compare well to Alfaro et al.’s estimate of 0.098170 for the entire dataset, since fastest1 and largest1 represent unusually fast and unusually slow (respectively) subsets thereof.

Based on the mcmcTREE manual and following Alfaro et al., \(\alpha\) (the shape parameter of the gamma distribution) was set to 2 and \(\beta\) (rate parameter) was chosen so that the mean \(\frac{\alpha}{\beta}\) was equal to the respective baseml estimate rescaled per 10 million (rather than 100 million) years:

## [1] "beta_fastest1 = 46.8204249421768"
## [1] "beta_largest1 = 846.955196070128"
parameter fastest1 largest1
\(\alpha\) (shape) 2 2
\(\beta\) (rate) 46.82 846.96

corresponding to the following priors:

Note that the sigma2_gamma prior employed here is highly consistent with the ucld.stdev prior selected for the BEAST analyses (a truncated exponential with a mean of 0.3). If the standard deviation of the logarithm of the rate is to be 0.3, the variation of the logarithm of the rate must be set to 0.09. Therefore, the mean \(\left(\frac{\alpha}{\beta}\right)\) of the gamma prior on log rate variance should be equal to 0.09. Since a gamma distribution reduces to an exponential when \(\alpha\) is set to 1, one can approximate the BEAST prior by setting \(\alpha\) to 1 and \(\beta\) to 1/0.09, which is here rounded to 10.

Since the independent rates of the relaxed clock model are drawn from a lognormal distribution whose mean \(\mu\) and variance \(\sigma^2\) are determined by the rgene_gamma and sigma2_gamma hyperpriors, the PAML manual recommends plotting this distribution to make sure it is reasonable:

\[ \begin{align*} \mu &\sim \Gamma\left(\alpha_1,\beta_1\right) & \mathbb{E}[\mu] &= \frac{\alpha_1}{\beta_1} & \mathrm{Var}[\mu] &= \frac{\alpha_1}{{\beta_1}^2} \\ \sigma^2 &\sim \Gamma\left(\alpha_2,\beta_2\right) & \mathbb{E}[\sigma^2] &= \frac{\alpha_2}{{\beta_2}} & \mathrm{Var}[\sigma^2] &= \frac{\alpha_2}{{\beta_2}^2} \end{align*} \]

For fastest1, this corresponds to:

alpha_1 <- 2
beta_1 <- 46.82
alpha_2 <- 1
beta_2 <- 10
## Note that while mu is the mean substitution rate, sigma squared is the variance
## of the logarithm of the rate rather than simply the variance of the rate.
## Note also that since alpha_2/beta_2 gives the prior mean of the variance, the
## prior mean of the corresponding st. dev. is equal to the square root of the ratio.
curve(dlnorm(x, meanlog = log(alpha_1/beta_1), sdlog = sqrt(alpha_2/beta_2)), from=0, to=1, 
      main = "Relaxed clock rate distribution", xlab = "Rate", ylab = "Probability density")

The prior choices described above were implemented in the following control file (for fastest1; the control file for largest1 only differed in the values of rgene_gamma):

          seed = -1
       seqfile = 3ad39b9161154796bf2aef1aa746b469.phy
      treefile = 12_cali_no_outgroups_scheme_3.tre
       outfile = fastest1.txt

         ndata = 1                             * No further partitioning; each run corresponds to 1 partition
       seqtype = 0                             * Data type: nucleotides
       usedata = 3                             * Store the Hessian matrix for approximate likelihood computation in out.BV
         clock = 2                             * Independent rates
       RootAge = 'B(9.8, 14.3, 1e-300, 0.05)'  * P of being younger than 98 Ma = 10^(-300) and P of being older than 143 Ma = 0.05

         model = 4                             * Most complex model available
         alpha = 0.1                           * Following Alfaro et al.
         ncatG = 8                             * Following Alfaro et al.

     cleandata = 0                             * Do not remove sites with ambiguity

       BDparas = 0.1 0.1 00.1                  * Birth, death, sampling
   kappa_gamma = 6 2                           * No effect since usedata = 3
   alpha_gamma = 1 1                           * No effect since usedata = 3

   rgene_gamma = 2 46.82                       * Gamma prior on substitution rate: estimated using baseml under strict clock
  sigma2_gamma = 1 10                          * Gamma prior on log rate variance: independent of time unit choice since clock = 2

      finetune = 1: .1 .1 .1 .1 .1 .1          * Auto finetune: times, musigma2, rates, mixing, paras, FossilErr

         print = 2                             * Print branch rates into an output file
        burnin = 500000                        * Following Alfaro et al.
      sampfreq = 500                           * Following Alfaro et al.
       nsample = 15000                         * Following Alfaro et al.

The mcmcTREE analyses were run as follows:

mcmctree fastest1mcmctree.ctl
mcmctree largest1mcmctree.ctl

The Hessian matrix was generated and stored in out.BV files (fastest1: time used = 4:29, largest1: time used = 36:58). These were renamed as in.BV, with the control files modified in vi to read as follows: usedata = 2. The analyses were then started again using the commands above, and finished in 70:29:15 (fastest1) and 67:28:12 (largest1), yielding the following trees:

fastest1 PAML time tree

fastest1 PAML time tree

largest1 PAML time tree

largest1 PAML time tree

Given the great disparity between the results of the two analyses, and their relatively fast run times, the remaining four partitions were subjected to the same sequence of analyses as outlined above. The following rate estimates were obtained using baseml:

partition fastest2 fastest3 largest2 largest3
rate 0.056057 0.372665 0.000100 0.029088

After rescaling, these can be used to calculate the hyperparameters of the rgene_gamma prior:

## [1] "beta_fastest2 = 356.779706370302"
## [1] "beta_fastest3 = 53.6675029852548"
## [1] "beta_largest2 = 200000"
## [1] "beta_largest3 = 687.568756875688"
parameter fastest2 fastest3 largest2 largest3
\(\alpha\) (shape) 2 2 2 2
\(\beta\) (rate) 356.78 53.67 200000 687.57

This yields the following prior distributions:

Note that a different x-axis scale was used for the largest2 prior, whose probability mass is concentrated into an extremely short interval close to 0.

New control files were created for fastest2 and fastest3 with settings identical to those in the first two files, except for the different values of the rgene_gamma \(\beta\) hyperparameter. The sigma2_gamma hyperparameters were left unmodified, with \(\sigma^2 \sim \Gamma\left(\alpha = 1, \beta = 10\right)\), except for the “largest2” partition, for which this setting would result in an extremely informative rate prior favoring values very close to 0. The sigma2_gamma hyperparameters for largest2 (\(\alpha = 1\), \(\beta = 0.1\)) were chosen so as to make the resulting lognormal density roughly comparable in shape to those used for the analyses of the other partitions:

After calculating the Hessian matrices (fastest2: time used = 3:34, fastest3: time used = 4:10, largest3: time used = 15:06), new analyses were started with usedata set to 2. The only exception was the “largest2” partition, whose baseml analysis finished with the following warning message: xmax = 0.0000e+00 close to zero at 235!. Given that this message was displayed in repeated trials, no attempt was made to proceed with mcmcTREE analysis. The analyses of the remaining partitions finished in 71:36:39 (fastest2), 71:14:32 (fastest3), and 49:52:29 (largest3) with the following results:

fastest2 PAML time tree

fastest2 PAML time tree

fastest3 PAML time tree

fastest3 PAML time tree

largest3 PAML time tree

largest3 PAML time tree

Analysis Protocol. Phase 2 – Extended Calibrations

A new set of mcmcTREE analyses was run on the “fastest1”, “fastest2”, “fastest3”, “largest1”, and “largest3” partitions under the same extended set of calibrations used in Phase 4 of the PhyloBayes analyses. These were incorporated into the following tree file:

118 1 
(((lampris_guttatus,(zu_elongatus,regalecus_glesne)'B(0.533,9.4,1e-300,0.05)')'B(5.417,12.25,1e-300,0.05)',((polymixia_lowei,(percopsis_omiscomycus,aphredoderus_sayanus)'B(5.8551,10.96,1e-300,0.05)')'B(9.39,12.8,1e-300,0.05)',((zeus_faber,zenopsis_conchifera)'B(3.202,9.14,1e-300,0.05)',(stylephorus_chordatus,(coryphaenoides_rupestris,gadus_morhua)'B(5.37,9.44,1e-300,0.05)'))'B(6.971,11.13,1e-300,0.05)'))'B(9.8,14.3,1e-300,0.05)',(anoplogaster_cornuta,((rondeletia_loricata,(sargocentron_coruscum2,(myripristis_violacea,myripristis_leiognathus))'B(4.9,9.37,1e-300,0.05)'),((carapus_bermudensis,lepophidium_profundorum),(batrachoides_pacifici,((kurtus_gulliveri,(((apogon_lateralis,ostorhinchus_nigrofasciatus),(apogon_maculatus,phaeoptyx_pigmentaria)),(odontobutis_obscura,((mogurnda_adspersa,erotelis_smaragdus),(ophiocara_porocephala,((bathygobius_soporator,(valenciennea_strigata,ptereleotris_evides)),(gillichthys_seta,(evorthodus_minutus,periophthalmus_barbarus)))))))),((((syngnathus_fuscus,aulostomus_sp)'B(4.9,9.37,1e-300,0.05)',(synchiropus_ocellatus,(dactyloptena_orientalis,(parupeneus_multifasciatus,pseudupeneus_maculatus))))'B(6.971,11.13,1e-300,0.05)',((chiasmodon_niger,kali_normani),((scomber_scombrus,scomberomorus_maculatus)'B(5.417,9.42,1e-300,0.05)',(((taractes_asper,brama_japonica),(ruvettus_pretiosus,assurger_anzac)),(psenopsis_anomala,(pomatomus_saltatrix,(cubiceps_baxteri,psenes_cyanophrys))))))),((((xenentodon_cancila,(pholidichthys_leucotaenia,pterophyllum_leopoldi)'B(4.546,6.74,1e-300,0.05)')'B(4.9,7.97,1e-300,0.05)',((plesiops_sp,(pseudochromis_flavivertex,pseudochromis_sankeyi)),(chromis_enchrysura,(gramma_loreto,(scartella_cristata,(labrisomus_nuchipinnis,emblemariopsis_sp)))))),((centropomus_medius,sphyraena_putnamae)'B(4.9,7.97,1e-300,0.05)',((polydactylus_sexfilis,(citharoides_macrolepis,((hippoglossina_oblonga,bothus_pantherinus),(parachirus_xenicus,poecilopsetta_plinthus)'B(4.12,6.66,1e-300,0.05)'))'B(4.9,7.97,1e-300,0.05)'),((toxotes_jaculatrix,(istiophorus_platypterus,mene_maculatus)'B(5.52,9.46,1e-300,0.05)'),((coryphaena_hippurus,rachycentron_canadum),(seriola_zonata,(alectis_indicus,(caranx_melampygus,(chloroscombrus_orqueta,alepes_kleinii))))'B(4.9,6.93,1e-300,0.05)')'B(5.417,8.08,1e-300,0.05)')))),((cephalopholis_argus,(hypoplectrus_puela,(taenianotus_triacanthus,pseudanthias_squamipinnis))),(micropterus_salmoides,((pempheris_schomburgkii,(cheimarrichthys_fosteri,((thalassoma_ballieui,(halichoeres_poeyi,halichoeres_radiatus)),(epibulus_insidiator,(scarus_trispinosus,scarus_ferrigineus))'B(1.19,6.33,1e-300,0.05)'))),(((pareques_acuminatus,micropogonias_undulatus),(lutjanus_goreensis,(haemulon_album,(haemulon_flavilineatum,haemulon_parra)))),(((pomacanthus_paru,(chaetodon_ocellatus,chaetodon_kleinii))'B(2.962,6.6,1e-300,0.05)',(naso_unicornis,(acanthurus_pyroferus,(acanthurus_bahianus,(acanthurus_bariene,acanthurus_olivaceus))))'B(4.9,6.93,1e-300,0.05)')'B(5.417,8.08,1e-300,0.05)',(antigonia_capros,((antennarius_striatus,(ogcocephalus_radiatus,cryptopsaras_couesii)'B(4.9,6.16,1e-300,0.05)'),(balistes_capriscus,(takifugu_occelatus,(tetraodon_suvatti,(canthigaster_margaritata,(canthigaster_rostrata,canthigaster_figueiredoi))))'B(3.202,5.85,1e-300,0.05)')'B(5.05,6.96,1e-300,0.05)'))))))))))))))'B(9.39,12.8,1e-300,0.05)')'B(9.8,14.3,1e-300,0.05)';

All other settings were identical to those used for the first round of mcmcTREE analyses, including the rgene_gamma and sigma2_gamma hyperprior values. However, the RootAge line was deleted from the control file, as the corresponding calibration density (a uniform density between 98 and 143 Ma, with soft bounds such that P(<98) = \(10^{-300}\) and P(>143) = 0.05) was already included in the tree file. This produced the resulting control file (fastest1 file shown as an example):

          seed = -1
       seqfile = 3ad39b9161154796bf2aef1aa746b469.phy
      treefile = 12_extended_cali_no_outgroups.tre
       outfile = fastest1ext.txt

         ndata = 1                             * No further partitioning; each run corresponds to 1 partition
       seqtype = 0                             * Data type: nucleotides
       usedata = 3                             * Store the Hessian matrix for approximate likelihood computation in out.BV
         clock = 2                             * Independent rates

         model = 4                             * Most complex model available
         alpha = 0.1                           * Following Alfaro et al.
         ncatG = 8                             * Following Alfaro et al.

     cleandata = 0                             * Do not remove sites with ambiguity

       BDparas = 0.1 0.1 00.1                  * Birth, death, sampling
   kappa_gamma = 6 2                           * No effect since usedata = 3
   alpha_gamma = 1 1                           * No effect since usedata = 3

   rgene_gamma = 2 46.82                       * Gamma prior on substitution rate: estimated using baseml under strict clock
  sigma2_gamma = 1 10                          * Gamma prior on log rate variance: independent of time unit choice since clock = 2

      finetune = 1: .1 .1 .1 .1 .1 .1          * Auto finetune: times, musigma2, rates, mixing, paras, FossilErr

         print = 2                             * Print branch rates into an output file
        burnin = 500000                        * Following Alfaro et al.
      sampfreq = 500                           * Following Alfaro et al.
       nsample = 15000                         * Following Alfaro et al.

The same procedure was used as in the first round of analyses, with the usedata value changed from 3 to 2 after generating the Hessian matrix, which records the second derivatives of the likelihood function at its maximum values. The out.BV file, in which the matrix was stored, was renamed as in.BV and served as an input for the second stage of the analysis (node age estimation proper).

The analyses finished in 39:32:49 (fastest1ext), 39:29:45 (fastest2ext), 39:24:17 (fastest3ext), and 36:33:56 (largest1ext; duration of the largest3ext analysis was not recorded) with the following results:

fastest1ext PAML time tree

fastest1ext PAML time tree

fastest2ext PAML time tree

fastest2ext PAML time tree

fastest3ext PAML time tree

fastest3ext PAML time tree

largest1ext PAML time tree

largest1ext PAML time tree

largest3ext PAML time tree

largest3ext PAML time tree

Analysis Protocol. Phase 3 – Truncated Cauchy priors

All of the mcmcTREE analyses above were performed with piecewise calibration densities that consisted of (1) a power-decay function left of the lower bound, (2) a uniform density between the lower and upper bounds, and (3) an exponential-decay function right of the upper bound. Different portions of the total probability mass can be allocated to these 3 densities: in the tree and control files shown above, these fractions were set equal to \(P = 10^{-300}\) for (1) and \(P =\) 0.05 for (3) to make the lower bounds as “hard” as possible while keeping the upper bounds relatively soft. However, it has been noted that calibrated divergences are generally more likely to take place shortly before the occurrence of the calibration fossil than to predate it by a large margin (Brown & van Tuinen 2011), suggesting that the uniform distribution might not be the best description of the uncertainty associated with divergence ages.

Analysis Protocol. Phase 4 – Corrected Calibrations

Similar to the PhyloBayes analyses described above, the mcmcTREE analyses were re-run under a corrected version of the original guide tree, in which the upper bound of the Homonotichthys calibration placed on the (Polymixia + Aphredoderus) node was decreased from 128.04 Ma to 116.35 Ma. All other settings were retained:

mcmctree fastest1corrmcmctree.ctl
mcmctree fastest2corrmcmctree.ctl
mcmctree fastest3corrmcmctree.ctl
mcmctree largest1corrmcmctree.ctl
mcmctree largest3corrmcmctree.ctl

The preliminary analyses produced the output in the form of a Hessian matrix in 2:32 (fastest1corr), 2:05 (fastest2corr), 2:11 (fastest3corr), 19:09 (largest1corr), and 11:52 (largest3corr), respectively. The matrix file were renamed from out.BV to in.BV and new mcmcTREE analyses were launched based on modified versions of the original control files, in which the value of usedata was changed from 3 to 2. These analyses finished in 40:18:24 (fastest1corr), 40:17:43 (fastest2corr), 38:56:28 (fastest3corr), 36:03:15 (largest1corr), and 37:21:11 (largest3corr), producing the results shown below:

fastest1corr PAML time tree

fastest1corr PAML time tree

fastest2corr PAML time tree

fastest2corr PAML time tree

fastest3corr PAML time tree

fastest3corr PAML time tree

largest1corr PAML time tree

largest1corr PAML time tree

largest3corr PAML time tree

largest3corr PAML time tree

Analysis of median-rate partitions

The partitions used for the analyses above seem to show a systematic bias: the fastest partitions tend to allocate evolutionary change to individual branches as evenly as possible, showing no rapid radiations. On the other hand, the largest partitions, which are also the slowest ones, tend to produce trees with explosive diversification bursts that are unrealistically comprehensive, often subsuming even intrageneric divergences that would be expected to be an order of magnitude younger (see PhyloBayes tree largest3ext and PAML trees largest3, largest1ext, and largest3ext above; note the ages of Acanthurus, Haemulon, Pseudochromis, and Scarus).

Therefore, it might be useful to infer time trees from partitions whose rates are neither extremely slow nor extremely fast, in order to see if they can provide more biologically realistic estimates that could detect rapid radiations at multiple time scales. Three such partitions were selected based on the absolute deviation from the median rate multiplier:

acanth <- read.delim("/Users/David/Grive/Alfaro_Lab/PartitionFinder/k-means/Acanthomorpha_Run_2/Rate_multipliers_Acanthomorpha.txt", header=TRUE)
colnames(acanth) <- c("Subset_ID", "Rate_multiplier", "Sites")

# Find the absolute deviation from the median
acanth$Diff_from_median <- 0
for(i in 1:nrow(acanth)) {
  acanth[i,"Diff_from_median"] <- abs(acanth[i,"Rate_multiplier"] - median(acanth$Rate_multiplier))
}

# Select the 3 partitions with rates closest to the median
head(acanth[order(acanth$Diff_from_median),], n=3)
##                           Subset_ID Rate_multiplier Sites Diff_from_median
## 23 2d1d94d7d53bf577c912ae0b72fe2274        1.421441  6442          0.05932
## 12 95d43d8ebf63945dc31116a80872c4ed        1.302801  5837          0.05932
## 27 f495e4e0f2f9bbbf091f778067b062f4        1.153401  6752          0.20872

PhyloBayes

The following preliminary (uncalibrated) PhyloBayes analyses were run in order to determine the hyperparameters of the birth-death process that is to be used as a branching-process prior in the subsequent calibrated runs:

../pb -d 2d1d94d7d53bf577c912ae0b72fe2274.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 median1nocal
../pb -d 95d43d8ebf63945dc31116a80872c4ed.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 median2nocal
../pb -d f495e4e0f2f9bbbf091f778067b062f4.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 median3nocal

The uncalibrated PhyloBayes runs were soft-stopped on Tuesday 02/07/2017, 0:56 AM (median1nocal, at 24665 generations), 00:58 AM (median2nocal, at 26673 generations), and 00:59 AM (median3nocal, at 23814 generations). The respective ESS values are shown in the table below:

parameter median1 median2 median3
states 24665 26673 23814
loglik 16074 17514 15944
length 12975 14133 14012
sigma 12107 11560 8863
mu 1130 1742 1889
meanrates 4357 6111 4592
p1 11488 14089 13425
p2 11732 19865 17218
alpha 17900 20523 17721
nmode - - -
stat 9578 10150 10546
statalpha - - -
rrent 12039 12712 12122
meanrr 1491 1828 1540
kappa - - -
allocent - - -

The parameters of interest – i.e., p1 and p2 – were found using the following commands. Note that the greatest autocorrelation times detected in the three analyses were 19.65 (median1: mu), 13.78 (median2: mu), and 13.92 (median3: meanrr), making the subsampling frequency applied in readdiv conservative.

../readdiv -x 2467 50 median1nocal
../readdiv -x 2667 50 median2nocal
../readdiv -x 2381 50 median3nocal

Resulting in the following estimates of the parameters of interest:

chain p1 p2 ESS
median1 0.00120075 +/- 0.000456566 9.00163e-05 +/- 7.51772e-05 443
median2 0.0016755 +/- 0.000425401 5.38323e-05 +/- 5.2924e-05 480
median3 0.00176085 +/- 0.000419111 4.24065e-05 +/- 3.91027e-05 428

Based on these results, calibrated analyses were launched using the corrected calibration file:

../pb -d 2d1d94d7d53bf577c912ae0b72fe2274.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00120075 0.0000900163 -cal calib_root_corrected -sb -gtr -dgam 4 median1cal
../pb -d 95d43d8ebf63945dc31116a80872c4ed.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0016755 0.0000538323 -cal calib_root_corrected -sb -gtr -dgam 4 median2cal
../pb -d f495e4e0f2f9bbbf091f778067b062f4.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00176085 0.0000424065 -cal calib_root_corrected -sb -gtr -dgam 4 median3cal

The calibrated analyses were soft-stopped on 02/11/2017 at 11:38 PM (median1cal: 10826 generations) and on 02/12/2017 at 4:24 PM (median2cal: 14019 generations) and 4:27 PM (median3cal: 12654 generations). The following ESS values were obtained using Tracer:

parameter median1 median2 median3
states 10826 14019 12654
loglik 5836 6916 6427
length 5995 6648 6541
sigma 4452 7594 5470
mu 259 860 664
meanrates 503 579 224
p1 - - -
p2 - - -
scale 343 482 1309
alpha 8385 10124 9464
nmode - - -
stat 4695 5952 5944
statalpha - - -
rrent 5511 6854 6563
meanrr 819 955 710
kappa - - -
allocent - - -

Based on the greatest recorded autocorrelation times of 37.59 (median1: mu), 26.18 (median2: scale), and 50.85 (median3: meanrates), readdiv was run as follows:

../readdiv -x 1083 50 median1cal
../readdiv -x 1402 50 median2cal
../readdiv -x 1265 55 median3cal

Producing the trees below:

median1 PhyloBayes time tree

median1 PhyloBayes time tree

median2 PhyloBayes time tree

median2 PhyloBayes time tree

median3 PhyloBayes time tree

median3 PhyloBayes time tree

The posterior mean ages listed in the corresponding _sample.dates were examined to detect possible underflows (bold) and overflows (italics) with regard to the specified calibrations:

Calibration Lower bound Upper bound Node number median1 median2 median3
Aipichthys 98.0 128.82 119 117.529 109.728 111.156
Berybolcensis 49.0 109.29 132 76.1029 65.009 70.1035
Calatomus 11.9 43.95 210 38.0123 38.3904 40.3444
Chaetodontidae 29.62 59.26 220 57.9509 58.2777 57.9137
Eobuglossus 41.2 53.88 189 50.1875 50.5963 49.0592
Eucoelopoma 54.17 95.58 162 84.971 65.6994 74.1213
Homonotichthys 93.9 116.35 123 108.916 99.1346 100.816
Mcconichthys 63.1 93.51 124 41.6338 42.3599 43.1568
Ramphexocoetus 49.0 80.52 173 76.3036 78.6618 78.9618
Tarkus 49.0 53.93 229 51.0414 51.9401 51.7883
Mene 55.2 95.64 192 93.8209 91.1315 90.3707
Eastmanelepes 49.0 61.61 195 58.7614 58.2695 57.9235

More detailed information about the underflows and overflows is available in the corresponding _sample.calib files.

PAML / mcmcTREE

As in the previously used partitions, the rgene_gamma prior was computed by running the baseml program from the PAML package on a tree whose root was calibrated by a point age of 113.02 Ma. All the settings specified in the baseml control files were retained.

The baseml analyses finished in 26:51:37 (median1), 9:51:01 (median2), and 28:04:54 (median3), with the following substitution rate estimates (per a time unit of 100 million years):

partition median1 median2 median3
rate 0.154514 0.141265 0.128926

These were used to determine the values of the rgene_gamma prior in the following way (assuming an \(\alpha\) value of 2):

## [1] "beta_median1 = 129.438109168101"
## [1] "beta_median2 = 141.57788553428"
## [1] "beta_median3 = 155.127747700231"

Resulting in the following estimates of \(\beta\):

parameter median1 median2 median3
\(\alpha\) (shape) 2 2 2
\(\beta\) (rate) 129.44 141.58 155.13

The sigma2_gamma prior was identical to that used in the previous analyses (i.e., a gamma distribution with the shape parameter equal to 1 and the rate parameter equal to 10). The priors placed on the mean substitution rate of each partition are shown below, as are the overall lognormal rate distribution whose mean and variance are determined by the means of the respective hyperpriors (i.e., by the means of rgene_gamma and sigma2_gamma, respectively):

These values of rgene_gamma and sigma2_gamma were subsequently incorporated into a new set of mcmcTREE control files, such as the one shown below (for median1). Note that all of the analyses were run on the corrected tree, for which the soft upper bound on the Homonotichthys calibration was lowered to 116.35 Ma.

          seed = -1
       seqfile = 2d1d94d7d53bf577c912ae0b72fe2274.phy
      treefile = 12_cali_no_outgroups_corrected.tre
       outfile = median1.txt

         ndata = 1                             * No further partitioning; each run corresponds to 1 partition
       seqtype = 0                             * Data type: nucleotides
       usedata = 3                             * Store the Hessian matrix for approximate likelihood computation in out.BV
         clock = 2                             * Independent rates
       RootAge = 'B(9.8, 14.3, 1e-300, 0.05)'  * P of being younger than 98 Ma = 10^(-300) and P of being older than 143 Ma = 0.05

         model = 4                             * Most complex model available
         alpha = 0.1                           * Following Alfaro et al.
         ncatG = 8                             * Following Alfaro et al.

     cleandata = 0                             * Do not remove sites with ambiguity

       BDparas = 0.1 0.1 00.1                  * Birth, death, sampling
   kappa_gamma = 6 2                           * No effect since usedata = 3
   alpha_gamma = 1 1                           * No effect since usedata = 3

   rgene_gamma = 2 129.44                      * Gamma prior on substitution rate: estimated using baseml under strict clock
  sigma2_gamma = 1 10                          * Gamma prior on log rate variance: independent of time unit choice since clock = 2

      finetune = 1: .1 .1 .1 .1 .1 .1          * Auto finetune: times, musigma2, rates, mixing, paras, FossilErr

         print = 2                             * Print branch rates into an output file
        burnin = 500000                        * Following Alfaro et al.
      sampfreq = 500                           * Following Alfaro et al.
       nsample = 15000                         * Following Alfaro et al.

The mcmcTREE analyses were run as follows:

mcmctree median1mcmctree.ctl
mcmctree median2mcmctree.ctl
mcmctree median3mcmctree.ctl

The Hessian matrices of the likelihood functions were generated in 4:55 (median1), 1:01:07 (median2), and 14:15 (median3). The respective files were renamed as in.BV and the “usedata” line was set to a value of 2 in in the original control files. The analyses were then re-launched using commands identical to those shown above and finished in 39:47:21 (median1), 39:39:56 (median2), and 39:28:41 (median3) with the following results:

median1 PAML time tree

median1 PAML time tree

median2 PAML time tree

median2 PAML time tree

median3 PAML time tree

median3 PAML time tree

All 3 partitions were then re-analyzed under the same prior settings, but with the extended list of calibrations. Note that the guide tree specifying the extended set of calibrations also misspecified the upper bound on the (Polymixia + Aphredoderus) node, and had to be corrected:

118 1 
(((lampris_guttatus,(zu_elongatus,regalecus_glesne)'B(0.533,9.4,1e-300,0.05)')'B(5.417,12.25,1e-300,0.05)',((polymixia_lowei,(percopsis_omiscomycus,aphredoderus_sayanus)'B(5.8551,10.96,1e-300,0.05)')'B(9.39,11.635,1e-300,0.05)',((zeus_faber,zenopsis_conchifera)'B(3.202,9.14,1e-300,0.05)',(stylephorus_chordatus,(coryphaenoides_rupestris,gadus_morhua)'B(5.37,9.44,1e-300,0.05)'))'B(6.971,11.13,1e-300,0.05)'))'B(9.8,14.3,1e-300,0.05)',(anoplogaster_cornuta,((rondeletia_loricata,(sargocentron_coruscum2,(myripristis_violacea,myripristis_leiognathus))'B(4.9,9.37,1e-300,0.05)'),((carapus_bermudensis,lepophidium_profundorum),(batrachoides_pacifici,((kurtus_gulliveri,(((apogon_lateralis,ostorhinchus_nigrofasciatus),(apogon_maculatus,phaeoptyx_pigmentaria)),(odontobutis_obscura,((mogurnda_adspersa,erotelis_smaragdus),(ophiocara_porocephala,((bathygobius_soporator,(valenciennea_strigata,ptereleotris_evides)),(gillichthys_seta,(evorthodus_minutus,periophthalmus_barbarus)))))))),((((syngnathus_fuscus,aulostomus_sp)'B(4.9,9.37,1e-300,0.05)',(synchiropus_ocellatus,(dactyloptena_orientalis,(parupeneus_multifasciatus,pseudupeneus_maculatus))))'B(6.971,11.13,1e-300,0.05)',((chiasmodon_niger,kali_normani),((scomber_scombrus,scomberomorus_maculatus)'B(5.417,9.42,1e-300,0.05)',(((taractes_asper,brama_japonica),(ruvettus_pretiosus,assurger_anzac)),(psenopsis_anomala,(pomatomus_saltatrix,(cubiceps_baxteri,psenes_cyanophrys))))))),((((xenentodon_cancila,(pholidichthys_leucotaenia,pterophyllum_leopoldi)'B(4.546,6.74,1e-300,0.05)')'B(4.9,7.97,1e-300,0.05)',((plesiops_sp,(pseudochromis_flavivertex,pseudochromis_sankeyi)),(chromis_enchrysura,(gramma_loreto,(scartella_cristata,(labrisomus_nuchipinnis,emblemariopsis_sp)))))),((centropomus_medius,sphyraena_putnamae)'B(4.9,7.97,1e-300,0.05)',((polydactylus_sexfilis,(citharoides_macrolepis,((hippoglossina_oblonga,bothus_pantherinus),(parachirus_xenicus,poecilopsetta_plinthus)'B(4.12,6.66,1e-300,0.05)'))'B(4.9,7.97,1e-300,0.05)'),((toxotes_jaculatrix,(istiophorus_platypterus,mene_maculatus)'B(5.52,9.46,1e-300,0.05)'),((coryphaena_hippurus,rachycentron_canadum),(seriola_zonata,(alectis_indicus,(caranx_melampygus,(chloroscombrus_orqueta,alepes_kleinii))))'B(4.9,6.93,1e-300,0.05)')'B(5.417,8.08,1e-300,0.05)')))),((cephalopholis_argus,(hypoplectrus_puela,(taenianotus_triacanthus,pseudanthias_squamipinnis))),(micropterus_salmoides,((pempheris_schomburgkii,(cheimarrichthys_fosteri,((thalassoma_ballieui,(halichoeres_poeyi,halichoeres_radiatus)),(epibulus_insidiator,(scarus_trispinosus,scarus_ferrigineus))'B(1.19,6.33,1e-300,0.05)'))),(((pareques_acuminatus,micropogonias_undulatus),(lutjanus_goreensis,(haemulon_album,(haemulon_flavilineatum,haemulon_parra)))),(((pomacanthus_paru,(chaetodon_ocellatus,chaetodon_kleinii))'B(2.962,6.6,1e-300,0.05)',(naso_unicornis,(acanthurus_pyroferus,(acanthurus_bahianus,(acanthurus_bariene,acanthurus_olivaceus))))'B(4.9,6.93,1e-300,0.05)')'B(5.417,8.08,1e-300,0.05)',(antigonia_capros,((antennarius_striatus,(ogcocephalus_radiatus,cryptopsaras_couesii)'B(4.9,6.16,1e-300,0.05)'),(balistes_capriscus,(takifugu_occelatus,(tetraodon_suvatti,(canthigaster_margaritata,(canthigaster_rostrata,canthigaster_figueiredoi))))'B(3.202,5.85,1e-300,0.05)')'B(5.05,6.96,1e-300,0.05)'))))))))))))))'B(9.39,12.8,1e-300,0.05)')'B(9.8,14.3,1e-300,0.05)';
mcmctree median1extmcmctree.ctl
mcmctree median2extmcmctree.ctl
mcmctree median3extmcmctree.ctl

The control files referenced in the commands given above were completely identical to those used in the first round of median-rate partition analyses, except for the guide tree specifying topology and divergence time priors.

When the analyses run under the usedata = 2 option finished after 5:31 (median1ext), 1:25:50 (median2ext), and 17:26 (median3ext), respectively, and the names of the Hessian matrix files as well as the control file usedata values were modified in the way described above, new mcmcTREE analyses were launched. These finished in 39:27:41 (median1ext), 38:59:27 (median2ext), and 39:21:48 (median3ext), with the following results:

median1ext PAML time tree

median1ext PAML time tree

median2ext PAML time tree

median2ext PAML time tree

median3ext PAML time tree

median3ext PAML time tree

rcluster partitions

Given the discrepancies between the results based on the fastest and slowest partitions, PartitionFinder was re-run on the full original alignment under the rcluster algorithm. Unlike the k-means algorithm, rcluster requires a set of prespecified partitions that may be grouped together in various ways but not split. While such an a priori partitioning scheme can easily be specified for exon data (e.g., by locus, by codon position, or by both), there is no obvious biologically motivated equivalent for ultraconserved elements. Therefore, the alignment was split into blocks of length 100 bases using the following code, which also ensures that the output is in the format required by PartitionFinder configuration files:

options(scipen=999)
# This is important for preventing R from converting the numbers "100000", "200000", and "300000"
# into scientific notation, which is not allowed in PartitionFinder configuration files

# How many rows are needed?
n <- ceiling(302488/100)

partitionname <- vector()
for(i in 1:n) {
  partitionname <- append(partitionname, paste("partition", i, sep = ""))
}
# Verify length
length(partitionname) - n
## [1] 0
start <- c(1)
for(i in 1:(n-1)) {
  start <- append(start, 100*i)
}
# Verify length:
length(start) - n
## [1] 0
end <- c(99)
for(i in 1:(n-1)) {
  end <- append(end, 100*i + 99)
}
# Verify length:
length(end) - n
## [1] 0
# Modify the last data block
end[3025] <- 302488

df <- data.frame(partitionname, start, end)
# Uncomment to print the file
# for(i in 1:n) {
# write(paste(df[i,"partitionname"], " = ", df[i,"start"], " - ", df[i,"end"], ";", sep = ""), 
#      "/Users/David/Grive/Alfaro_Lab/PartitionFinder/rcluster/data_blocks.txt",
#      append=TRUE)
# }

This produced the configuration file below (first 30 lines shown):

alignment = acanthomorph-gblocks-clean-min-90-taxa;

user_tree_topology = ExaBayes_ConsensusExtendedMajorityRuleNewick_Acanthomorph.tre;

branchlengths = linked;

models = GTR+G;

model_selection = BIC;

[data_blocks]
partition1 = 1 - 99;
partition2 = 100 - 199;
partition3 = 200 - 299;
partition4 = 300 - 399;
partition5 = 400 - 499;
partition6 = 500 - 599;
partition7 = 600 - 699;
partition8 = 700 - 799;
partition9 = 800 - 899;
partition10 = 900 - 999;
partition11 = 1000 - 1099;
partition12 = 1100 - 1199;
partition13 = 1200 - 1299;
partition14 = 1300 - 1399;
partition15 = 1400 - 1499;
partition16 = 1500 - 1599;
partition17 = 1600 - 1699;
partition18 = 1700 - 1799;
partition19 = 1800 - 1899;

The configuration file shown above was run with the following command line options:

python PartitionFinder.py /home/analysis/partitionfinder-new/acanth-75p-rcluster/partition_finder.cfg --verbose --save-phylofiles --raxml

Note that the --rcluster-max and --rcluster-percent were left unspecified, effectively keeping them at their default values of 1000 and 10, respectively.

The list of final subsets given in the best-scheme file was manually extracted and saved into a new file, from which three subsets were selected at random by their IDs. The alignments selected were discarded if they did not reach a minimum length of 2000 sites, and the process was repeated until three sufficiently long alignments have been collected.

rcluster <- read.delim("/Users/David/Grive/Alfaro_Lab/PartitionFinder/rcluster/subsetid.txt", sep = "|", strip.white = T, header=TRUE)
sample(rcluster[, "subset.id"], 3, replace=FALSE)
## [1] 8538dddc1c7a73f9130172d7e145be26 0b7ea30f64258c92fe78ccc2e3a8c61f
## [3] 74acf181e866d4f4730023ae022126a5
## 95 Levels: 01ae33a19b7d0b8db59c43886d68d1bf ...
Subset ID Sites Name of chain
08d545f1bb470785ba7a046efdafcfe7 4900 rcluster1
dbfddb57bfeaeae901d79d97235e165d 3700 rcluster2
15466a299bd6e3c6e559a61d97fe0562 2100 rcluster3

PhyloBayes

As in previous cases, PhyloBayes was first run to estimate the birth-death hyperparameters, which would be fixed in the later calibrated runs:

../pb -d 08d545f1bb470785ba7a046efdafcfe7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 rcluster1nocal
../pb -d dbfddb57bfeaeae901d79d97235e165d.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 rcluster2nocal
../pb -d 15466a299bd6e3c6e559a61d97fe0562.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 rcluster3nocal

The chains were paused on Sunday 02/19/2017, 12:22 PM (rcluster1nocal, at 18623 generations), 12:23 PM (rcluster2nocal, at 22580 generations), and 12:24 PM (rcluster3nocal, at 31709 generations). The ESS values obtained are given below:

parameter rcluster1 rcluster2 rcluster3
states 18623 22580 31709
loglik 8994 9717 17692
length 2445 1487 4290
sigma 7608 7490 19974
mu 1010 1880 2416
meanrates 2750 3148 11542
p1 7343 3393 19669
p2 2427 1478 9880
alpha 9135 10517 17294
nmode - - -
stat 2956 2714 4733
statalpha - - -
rrent 7305 7120 10937
meanrr 1279 1564 2283
kappa - - -
allocent - - -

Based on the longest autocorrelation times detected in the three analyses (rcluster1: mu, 16.59; rcluster2: p2, 13.75; rcluster3: meanrr, 12.50), the readdiv program was run as follows in order to estimate the mean posterior values of p1 and p2 as well as their standard deviations:

../readdiv -x 1862 50 rcluster1nocal
../readdiv -x 2258 50 rcluster2nocal
../readdiv -x 3171 50 rcluster3nocal

Resulting in the following estimates of the parameters of interest:

chain p1 p2 ESS
rcluster1 0.00118489 +/- 0.000991948 0.00045689 +/- 0.00025091 335
rcluster2 0.00166149 +/- 0.00130985 0.000593605 +/- 0.000266553 406
rcluster3 0.00151578 +/- 0.00120178 0.000427276 +/- 0.000240445 570

These values served as hyperparameters determining the shape of the birth-death prior used in the calibrated analyses:

../pb -d 08d545f1bb470785ba7a046efdafcfe7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00118489 0.00045689 -cal calib_root_corrected -sb -gtr -dgam 4 rcluster1cal
../pb -d dbfddb57bfeaeae901d79d97235e165d.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00166149 0.000593605 -cal calib_root_corrected -sb -gtr -dgam 4 rcluster2cal
../pb -d 15466a299bd6e3c6e559a61d97fe0562.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00151578 0.000427276 -cal calib_root_corrected -sb -gtr -dgam 4 rcluster3cal

The calibrated analyses were soft-stopped on Tuesday 02/21/2017 at 11:52 PM (rcluster1cal; 10737 generations), 11:53 PM (rcluster2cal; 12740 generations), and 11:54 PM (rcluster3cal; 16351 generations). The following effective sample sizes were reported by Tracer:

parameter rcluster1 rcluster2 rcluster3
states 10737 12740 16351
loglik 4594 5358 8938
length 1423 865 2116
sigma 5989 4081 9860
mu 616 935 1348
meanrates 980 1581 4620
p1 - - -
p2 - - -
scale 657 598 3390
alpha 5968 5878 8867
nmode - - -
stat 1345 1397 1947
statalpha - - -
rrent 4098 3299 5291
meanrr 678 802 1240
kappa - - -
allocent - - -

These values corresponded to the maximum autocorrelation times of 15.69 (rcluster1: mu), 19.18 (rcluster2: scale), and 11.86 (rcluster3: meanrr). Based on those, readdiv analyses were launched using the following commands:

../readdiv -x 1074 30 rcluster1cal
../readdiv -x 1274 30 rcluster2cal
../readdiv -x 1635 30 rcluster3cal

The resulting time trees are shown below:

rcluster1 PhyloBayes time tree

rcluster1 PhyloBayes time tree

rcluster2 PhyloBayes time tree

rcluster2 PhyloBayes time tree

rcluster3 PhyloBayes time tree

rcluster3 PhyloBayes time tree

The underflows (in bold) and overflows (in italics) of the posterior mean divergence dates with respect to the calibration bounds are summarized in the table below:

Calibration Lower bound Upper bound Node number rcluster1 rcluster2 rcluster3
Aipichthys 98.0 128.82 119 125.479 121.048 124.336
Berybolcensis 49.0 109.29 132 86.7686 85.2699 82.6209
Calatomus 11.9 43.95 210 35.104 31.0816 34.3208
Chaetodontidae 29.62 59.26 220 59.3976 59.7278 55.4057
Eobuglossus 41.2 53.88 189 45.7935 44.9722 43.1575
Eucoelopoma 54.17 95.58 162 68.4367 63.5752 73.5542
Homonotichthys 93.9 116.35 123 115.157 115.413 114.048
Mcconichthys 63.1 93.51 124 48.2893 58.3972 63.1317
Ramphexocoetus 49.0 80.52 173 78.1323 74.7231 76.9205
Tarkus 49.0 53.93 229 50.2866 50.3269 51.2596
Mene 55.2 95.64 192 78.3756 67.9794 67.6891
Eastmanelepes 49.0 61.61 195 55.8886 54.9043 56.9354

More complete information on the overflows and underflows is available in the _sample.calib files.

The analyses were re-run under a new calibration file that contained the full 29-calibration point dataset as well as the corrected maximum placed on the (Polymixia + Aphredoderus) calibration:

../pb -d 08d545f1bb470785ba7a046efdafcfe7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00118489 0.00045689 -cal extendcorrectcalib -sb -gtr -dgam 4 rcluster1ext
../pb -d dbfddb57bfeaeae901d79d97235e165d.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00166149 0.000593605 -cal extendcorrectcalib -sb -gtr -dgam 4 rcluster2ext
../pb -d 15466a299bd6e3c6e559a61d97fe0562.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00151578 0.000427276 -cal extendcorrectcalib -sb -gtr -dgam 4 rcluster3ext

These chains were soft-stopped on Thursday 02/23/2017 at 6:08 PM (rcluster1ext: 7025 generations), 6:21 PM (rcluster2ext: 8130 generations), and 6:22 PM (rcluster3ext: 10460 generations), after their parameters reached the following effective sample sizes:

parameter rcluster1ext rcluster2ext rcluster3ext
states 7025 8130 10460
loglik 2966 2972 6198
length 849 525 1225
sigma 4072 2749 6702
mu 526 578 1142
meanrates 468 1393 3837
p1 - - -
p2 - - -
scale 499 692 1815
alpha 3799 3523 5257
nmode - - -
stat 1101 1223 1405
statalpha - - -
rrent 2981 2309 4156
meanrr 592 413 939
kappa - - -
allocent - - -

The readdiv program was started using the commands shown below, which were based on the greatest autocorrelation times observed in Tracer (rcluster1ext: meanrates, 13.50; rcluster2ext: meanrr, 17.70; rcluster3ext: meanrr, 10.02):

../readdiv -x 703 30 rcluster1ext
../readdiv -x 813 30 rcluster2ext
../readdiv -x 1046 30 rcluster3ext

Yielding the following time trees:

rcluster1ext PhyloBayes time tree

rcluster1ext PhyloBayes time tree

rcluster2ext PhyloBayes time tree

rcluster2ext PhyloBayes time tree

rcluster3ext PhyloBayes time tree

rcluster3ext PhyloBayes time tree

In the table below, the lower and upper bounds placed on the calibrated nodes (including the root) are compared with the corresponding posterior mean ages, with overflows in italics and underflows in bold:

Calibration Lower bound Upper bound Node # rcluster1ext rcluster2ext rcluster3ext
Root 98.0 143.0 118 127.786 129.419 127.541
Hoplopteryx 93.9 128.0 129 122.357 124.034 123.49
Calatomus 11.9 63.3 210 38.2518 34.1469 43.5446
Triodon 50.5 69.6 230 58.5936 59.6328 66.5676
Archaeotetraodon 32.02 58.5 231 31.8787 32.4746 32.1466
Tarkus 49.0 61.6 229 50.1895 50.6793 53.2813
Luvarus 54.17 80.8 219 76.5659 71.8821 77.8735
Proacanthurus 49.0 69.3 222 51.7481 51.2993 53.0406
Chaetodontidae 29.62 66.0 220 74.0795 67.686 62.0793
Archaeus 54.17 80.8 193 57.4952 59.8435 66.1987
Eastmanalepes 49.0 69.3 195 54.8664 56.9666 62.56
Mene 55.2 94.6 192 73.8609 68.2343 68.6717
Eobothus 49.0 79.7 186 63.1293 62.7777 58.3339
Eobuglossus 41.2 66.6 189 44.2319 46.5881 42.7674
Sphyraena 49.0 79.7 183 78.4609 72.0311 77.8716
Rhamphexocoetus 49.0 79.7 173 75.7623 74.3786 75.5472
Mahengechromis 45.46 67.4 174 61.8942 64.2091 60.5732
Gasterorhamphosus 69.71 111.3 154 79.93 77.2257 84.5181
Prosolenostomus 49.0 93.7 155 73.8318 72.155 77.2461
Eocoelopoma 54.17 94.2 162 65.0129 64.8334 74.7656
Berybolcensis 49.0 93.7 132 81.1629 83.2881 76.0081
Aipichthys 98.0 143.0 119 123.691 120.623 121.403
Cretzeus 69.71 111.3 125 107.124 105.645 106.574
Rhinocephalus 53.7 94.4 128 59.9218 63.9852 62.5429
Zenopsis 32.02 91.4 126 35.779 34.08 36.6441
Homonotichthys 93.9 116.35 123 114.559 115.027 112.495
Massamorichthys 58.551 109.6 124 41.1684 51.1745 57.5285
Turkmene 54.17 122.5 120 55.727 55.6773 65.1835
Trachipterus 5.33 94.0 121 21.0129 16.9183 33.7002

More detailed information about the underflows and overflows is available in the _sample.calib files.

PAML / mcmcTREE

In the first step of the analysis, baseml was run on a tree without branch length information whose root was calibrated at 113.02 Ma (in the units of hundreds of millions of years, i.e., denoted as “1.1302”). The settings used in the baseml control file were identical to those described above.

The preliminary analyses finished in 3:08:57 (rcluster1), 1:21:47 (rcluster2), and 52:53 (rcluster3), yielding the following rates of substitution per 100 million years:

partition rcluster1 rcluster2 rcluster3
rate 0.151141 0.173575 0.099392

These were used to determine the values of the rgene_gamma prior in the following way (assuming an \(\alpha\) value of 2):

## [1] "beta_rcluster1 = 132.326767720208"
## [1] "beta_rcluster2 = 115.22396658505"
## [1] "beta_rcluster3 = 201.223438506117"

Leading to the \(\beta\) values shown below:

parameter median1 median2 median3
\(\alpha\) (shape) 2 2 2
\(\beta\) (rate) 132.33 115.22 201.22

The prior placed on the variance of the logarithm of the rate was retained from the previous PAML analyses (i.e., a gamma distribution with \(\alpha = 1\) and \(\beta = 10\)). The prior distributions assigned to the mean substitution rates of the three partitions are shown below. The means of the rgene_gamma and sigma2_gamma hyperpriors were then used as the mean and variation of the overall rate distributions shown further below:

Aside from the values of rgene_gamma and sigma2_gamma, the mcmcTREE control files created for the rcluster partitions were identical to those written for the “median” runs, including the use of the corrected calibrated tree. To estimate the Hessian matrix for branch lengths, mcmcTREE was initially run under the usedata = 3 option and finished up in 3:35 (rcluster1), 3:02 (rcluster2), and 1:15 (rcluster3). The Hessian matrices was stored in out.BV files, which were subsequently renamed as in.BV, with the usedata value changed accordingly to 2.

The mcmcTREE chains reached the length and sample sizes specified in their respective control files after 35:02:29 (rcluster1), 35:22:06 (rcluster2), and 34:42:23 (rcluster3), with the following results:

rcluster1 PAML time tree

rcluster1 PAML time tree

rcluster2 PAML time tree

rcluster2 PAML time tree

rcluster3 PAML time tree

rcluster3 PAML time tree

The analyses were then repeated under the extended set of calibrations that also incorporated the corrected value of the upper bound assigned to the (Polymixia + Aphredoderus) divergence. The initial analyses (run under the usedata = 3 option) finished up in 2:44 (rcluster1ext), 2:11 (rcluster2ext), and 1:03 (rcluster3ext). As in the previous analyses, the usedata value in the respective control files was changed to 2 and the out.BV files were renamed as in.BV to serve as an input for the next round of analyses, which was completed after 35:24:28 (rcluster1ext), 35:46:03 (rcluster2ext), and 35:01:07 (rcluster3ext). The resulting time trees are shown below:

rcluster1ext PAML time tree

rcluster1ext PAML time tree

rcluster2ext PAML time tree

rcluster2ext PAML time tree

rcluster3ext PAML time tree

rcluster3ext PAML time tree

BEAST

The rcluster partitions were analyzed in BEAST under a set of 13 calibration points (including one placed on the root, and with the corrected upper bound value of 116.35 Ma assigned to the (Polymixia + Aphredoderus) node), the random local clock model, and the fixed topology operator mix with default values. The length of all three chains was set to 500 million generations, sampling each 1000th generation:

beast -threads 12 -beagle_SSE rcluster1rlc.xml
beast -threads 12 -beagle_SSE rcluster2rlc.xml
beast -threads 12 -beagle_SSE rcluster3rlc.xml

(Note: Not finished because of the leontocebus power outage; not restarted – root age values did not seem promising.)

Best partition selected by SortaDate

PhyloBayes

The initial PhyloBayes run was conducted without calibrations and served to estimate the birth-death model hyperparameters:

../pb -d 8be7c94dcf2d71970c663f6710af40d7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 bestnocal

The analysis was soft-stopped on Monday 04/24/2017 at 10:44 PM at 6137 generations, with the following ESS values:

parameter ESS
states 6137
loglik 3195
length 2230
sigma 1835
mu 340
meanrates 656
p1 523
p2 210
alpha 1767
nmode -
stat 2138
statalpha -
rrent 2576
meanrr 359
kappa -
allocent -

Based on these and on the maximum autocorrelation time shown by Tracer (p2; 26.34), the chain was summarized as follows:

../readdiv -x 614 27 bestnocal

This command provided the following summary of the posterior distributions for p1 and p2:

p1 p2
0.00127672 +/- 0.00105001 0.000816072 +/- 0.00030442

The calibrated analysis was thus run under the following settings:

../pb -d 8be7c94dcf2d71970c663f6710af40d7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00127672 0.000816072 -cal calib_root_corrected -sb -gtr -dgam 4 bestcal

PAML / mcmcTREE

In order to make the analysis more directly comparable to the results obtained using BEAST (see below), a root calibration of 120.5 Ma (rather than 113.02 Ma) was used in baseml to estimate the mean susbtitution rate. The time unit was set to one hundred million years (i.e., the calibration was denoted as “1.205”). All other settings were identical to those employed in the baseml analyses reported above.

Baseml estimated the rate of evolution of the partition to be 0.237600 substitutions per one time unit (i.e., per 100 million years). The rate parameter (\(\beta\)) of rgene_gamma was set to a value such that \(\frac{2}{\beta}\) equals the rate above expressed in terms of tens of millions of years:

## [1] "beta_best = 84.1750841750842"

Resulting in the following rgene_gamma distribution:

The hyperprior assigned to sigma2_gamma was identical to that used in the previous analyses, with \(\sigma^2 \sim \Gamma\left(\alpha = 1, \beta = 10\right)\). The overall rate distribution can be visualized as follows (with the mean and standard deviation set to the expected values of the respective hyperpriors):

Next, a Hessian matrix was generated and stored in out.BV (mcmcTREE runtime = 2:27) using the following command:

mcmctree bestmcmctree.ctl

After the appropriate modifications of the control and output files, a second mcmcTREE analysis was started using the same command (runtime =

BEAST

The root was assigned an exponential prior whose mean was equal to the mean of the uniform prior used in PhyloBayes and in mcmcTREE, and which was truncated at 143 Ma to prevent BEAST from inferring excessively old dates for the root:

root <- (143 - 98)/2
paste("Root", root, sep = " = ")
## [1] "Root = 22.5"

The prior placed on the (Polymixia + Aphredoderus) node (calibrated by Homonotichthys) was constructed using the corrected maximum of 116.35 Ma. The analysis was run under the lognormal relaxed clock model and the fixed-topology operator mix, in which the operators acting on the ucld.mean and ucld.stdev parameters received tuning values of 0.9 and weights of 6.0. MCMC was run for 200 million generations, with a sampling frequency of 1 sample per 1000 generations.

beast -threads 12 -beagle_SSE bestpartition-run1.xml